Site icon John Maeda’s Blog

Snowflakes, Stars, Safety

In the wake of the big Snowflake news … I liked Garry’s post.

Snowflake was a revelation in how much faster you could make your data warehouse Narrator is the next wave of how to answer more useful business questions faster and with far less complexity and cost https://www.narrator.ai

Garry Tan

I then went to Narrator and learned about star model vs snowflake model (and universe model).

This old Oracle link is super quaint and has GIFs so it’s ooooold.

A star schema model can be depicted as a simple star: a central table contains fact data and multiple tables radiate out from it, connected by the primary and foreign keys of the database. In a star schema implementation, Warehouse Builder stores the dimension data in a single table or view for all the dimension levels.

For example, if you implement the Product dimension using a star schema, Warehouse Builder uses a single table to implement all the levels in the dimension, as shown in the screenshot. The attributes in all the levels are mapped to different columns in a single table called PRODUCT.

Oracle

versus snowflake …

The snowflake schema represents a dimensional model which is also composed of a central fact table and a set of constituent dimension tables which are further normalized into sub-dimension tables. In a snowflake schema implementation, Warehouse Builder uses more than one table or view to store the dimension data. Separate database tables or views store data pertaining to each level in the dimension.

The screenshot displays the snowflake implementation of the Product dimension. Each level in the dimension is mapped to a different table.

Oracle

As to when to use one over the other according to Ralph Kimball:

Notes on Snowflake.com

I had the opportunity to hear from an expert on why Snowflake.com is doing so well. It’s in the same class of other DWaaS (Data Warehouse as a Service) with Google Big Query, Amazon Redshift, and Microsoft Azure Synapse. The upsides are:

What’s interesting about Snowflake is that it’s cloud platform agnostic. It runs on all AWS, Azure, and GCP. It can store “semi-structured” data content typed as “Variant” — it lets you query nested and unstructured data.

Snowflake loads semi-structured data into a single VARIANT column.

from introduction to semi-structured data

Apparently it is great at “time travel” — i.e. undo. A command called “UNDROP” for a table lets you rewind to a different time. Apparently it’s highly performant, and “CLONE” appears to be even cooler.

Exit mobile version