Snowflakes, Stars, Safety

In the wake of the big Snowflake news … I liked Garry’s post.

Snowflake was a revelation in how much faster you could make your data warehouse Narrator is the next wave of how to answer more useful business questions faster and with far less complexity and cost

Garry Tan

I then went to Narrator and learned about star model vs snowflake model (and universe model).

This old Oracle link is super quaint and has GIFs so it’s ooooold.

A star schema model can be depicted as a simple star: a central table contains fact data and multiple tables radiate out from it, connected by the primary and foreign keys of the database. In a star schema implementation, Warehouse Builder stores the dimension data in a single table or view for all the dimension levels.

For example, if you implement the Product dimension using a star schema, Warehouse Builder uses a single table to implement all the levels in the dimension, as shown in the screenshot. The attributes in all the levels are mapped to different columns in a single table called PRODUCT.


versus snowflake …

The snowflake schema represents a dimensional model which is also composed of a central fact table and a set of constituent dimension tables which are further normalized into sub-dimension tables. In a snowflake schema implementation, Warehouse Builder uses more than one table or view to store the dimension data. Separate database tables or views store data pertaining to each level in the dimension.

The screenshot displays the snowflake implementation of the Product dimension. Each level in the dimension is mapped to a different table.


As to when to use one over the other according to Ralph Kimball:

  • “Large customer dimensions where, for example, 80 percent of the fact table measurements involve anonymous visitors about whom you collect little detail, and 20 percent involve reliably registered customers about whom you collect much detailed data by tracking many dimensions.”*
  • “Financial product dimensions for banks, brokerage houses, and insurance companies, because each of the individual products has a host of special attributes not shared by other products.”*
  • “Multienterprise calendar dimensions because each organization has idiosyncratic fiscal periods, seasons, and holidays.”*

Notes on

I had the opportunity to hear from an expert on why is doing so well. It’s in the same class of other DWaaS (Data Warehouse as a Service) with Google Big Query, Amazon Redshift, and Microsoft Azure Synapse. The upsides are:

  • low complexity (install, admin, maintain handled by others while all you need to do is play with your data)
  • lower cost (pay-as-you-go for computes and fully leverage cloud)
  • high performance by default (no tuning, maintaining, etc)
  • elasticity (scale up or scale down per demand — capacity planning is no longer needed)
  • faster time to market (easy to provision and quick deploy)
  • safety: improved security, business continuity, disaster recovery (encrypted by default, replication available, and availability zones)

What’s interesting about Snowflake is that it’s cloud platform agnostic. It runs on all AWS, Azure, and GCP. It can store “semi-structured” data content typed as “Variant” — it lets you query nested and unstructured data.

Snowflake loads semi-structured data into a single VARIANT column.

from introduction to semi-structured data

Apparently it is great at “time travel” — i.e. undo. A command called “UNDROP” for a table lets you rewind to a different time. Apparently it’s highly performant, and “CLONE” appears to be even cooler.