Sequences in Snowflake - Unsung Data Integrity Heroes
Updated: Feb 21
Sequences are the unsung heroes of a data application. I like to think about sequences in two forms:
Surrogate keys - the auto number sequences assigned to a table to track rows
Natural keys (IDs) - sequences that form groupings of business logic
Surrogate keys are the backbone of the data warehouse. This article by Robert Kimball forms the philosophical basis of surrogate keys.
The Kimball Group definition of Surrogate Keys
The key takeaways on surrogate keys are as follows:
They are the foundation of the data warehouse
Create them as anonymous integers using an automatic number sequence
They protect you from changes in the incoming data as it changes over time
They are responsible for the data integrity of the warehouse
In snowflake, the best habit is to assign a surrogate key to every table as an automated row increment and give it the KEY suffix along with the table name to emphasize its attachment to the data in the table.
So if I created a dimensional table on my New Music for Data Analytics sandbox to contain the hosts for each show in the radio station, the table name would be WAREHOUSE.DIM_HOST and the primary Surrogate key is DIM_HOST_KEY.
CREATE TABLE IF NOT EXISTS WAREHOUSE.DIM_HOST ( DIM_HOST_KEY INT PRIMARY KEY AUTOINCREMENT START 1 INCREMENT 1, HOST_ID INT NOT NULL, NAME VARCHAR NULL, URI VARCHAR NULL, IMAGE_URI VARCHAR NULL, THUMBNAIL_URI VARCHAR NULL, IS_ACTIVE BOOLEAN NULL, DW_ACTIVE BOOLEAN DEFAULT TRUE NOT NULL, DW_FROM_DATE TIMESTAMPLTZ, DW_TO_DATE TIMESTAMPLTZ DEFAULT CAST('2099-12-31 00:00:00' AS TIMESTAMP_LTZ(9)) NOT NULL, DW_CREATE_DATE TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP(), DW_CREATE_USER VARCHAR NOT NULL DEFAULT CURRENT_USER(), DW_UPDATE_DATE TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP(), DW_UPDATE_USER VARCHAR NOT NULL DEFAULT CURRENT_USER() );
Natural Keys (IDs)
Natural keys create a meaningful way to identify records. Again, the best practice is to use these values as integers. It should have the NOT NULL restriction on the table to prevent a row without meaningful identification from entering the warehouse. For the KEXP app, these keys are created by the incoming data feed.
Sometimes we need to create sequences that don't necessarily fall into a simple auto-increment pattern.
For this example, I have an analytics table where the auto-increment took the order when a host was added as default. Still, since my favorites change over time, this value has no meaning outside of the last time I recorded my preference.
insert into ANALYTICS.FAVORITE_HOST(DIM_HOST_KEY) with fav_hosts as (select DIM_HOST_KEY, HOST_ID, NAME, URI, IMAGE_URI, THUMBNAIL_URI, IS_ACTIVE from WAREHOUSE.DIM_HOST where NAME in ('Cheryl Waters', 'Evie', 'Gabriel Teodros')) select DIM_HOST_KEY from fav_hosts;
This table uses the following sequence to register favorites in the order they were added automatically.
CREATE OR REPLACE SEQUENCE ANALYTICS.FAVORITE_SEQUENCE START = 1 INCREMENT = 1 ;
No CURRVAL for Current Value
One drawback Snowflake has is the lack of a CURRVAL method to get the current value. The rationale is that Snowflake doesn't want the user to use small, single-row queries.