top of page
Search
  • Writer's pictureTim Burns

Data Warehousing, the Three-Tier Data Pattern, and Normal Forms (2NF, 3NF)


Boston Public Library
Boston Public Library

The Data Warehouse (Lakehouse)

So we want to store all our data in a data warehouse to make business decisions based on data. Every data warehouse has some form of the Three-Tier Pattern. The names may or may not be consistent, but usually, there are three major levels: The Landing Zone, The Processing Layer, and the Data Warehouse.


The Landing Zone

Data in the Landing Zone Layer is ingested from external sources. They might be other databases, CSV files, JSON files, etc. It contains a complete representation of all the data for every source of our data warehouse and is stored in its native format. In this layer, we check for missing or inconsistent data. Databricks calls this layer the "Bronze" layer in their medallion architecture.


The Processing Layer

The Processing Layer is where the data is processed and organized. Generally, the result of the processing layer is transforming the landing zone data into the Second or Third Normal Form (2NF, 3NF). The normal forms are layers where operational data is normalized into structures that facilitate query performance.


The 2NF and 3NF in the Processing Layer

The 2NF is a normalized form where all the data is split into parts representing a specific business concept, for example, retailer, vendor, customer, or product. The 3NF is a further refinement that normalizes data so that an insert or an update writes data into only one place.


Design Principle: The 3NF Normalizes data so that an insert or update writes only into one place.

Examples of Processing Layer Transformations:

  • De-duplication

  • Normalize data across sources

  • Validate data

  • Pivot data to a single primary key attribute

The Data Warehouse Layer

The Data Warehouse Layer is for Analytics and Reporting. The Data Warehouse Layer aims to reduce the processing layer's complexity to a Star Schema to optimize the data for querying.

A common mistake is confusing the 3NF and the Data Warehouse. Normalization will introduce a tremendous amount of unnecessary complexity, and the purpose of the Data Warehouse is for the end user that builds reports for analysis. The Star Schema reduces the complexity of the 3NF and presents a simplified and unified schema for analysis.


A litmus test on whether you are trying to use the 3NF as a Data Warehouse is that your tables have too many unnecessary or empty details.

17 views0 comments

Recent Posts

See All

Carto, Snowflake, and Data Management

A basic principle of data management: Don't move data unless you have to. Moving data is expensive and error-prone. Data Egress Cost: How To Take Back Control And Reduce Egress Charges Archiving to S

Music Festivals 2024

I am browsing music festivals I'd like to attend in 2024. I work for Evolve.com which manages vacation properties and my search came up with this site from one of our biggest competitors. Vacasa: US

bottom of page