Search
  • Tim Burns

Keep things Simple and be a Better Data Engineer

Generally one thinks of the Data Engineers as those in the background, creating database schemas, writing SQL code, working on transformation pipelines. That's my day to day to job.


But you can make things easier on yourself by always looking for ways to reduce cognitive load.


Better Living through Naming

Give Your Projects Real World Names

Don't fall for the corporate TLA game and start calling your project by an acronym that only means something in the company. Call your project what it is. Is it a Data Warehouse? Is it a Data Mart? Is it an API? This means you should be familiar with industry terms. You should be able to explain what you do to someone who doesn't work for your company but is also a Data Engineer.


Name Tables with Names that Show their Function and Business Reason

When you look through tables, you should be able to easily identify the function and type of data in the table by the name, and that should be consistent.


Say you have tables you use to COPY data from a CUSTOMER to a stage location. Call that table CPY_CUSTOMER. You can use CPY here to abbreviate, because only the internal folks will see this table and we want to have a way to instantly recognize it as internal.


When that table is used to stage the data, call it STG_CUSTOMER.


When that table is used in a data warehouse, call it CUSTOMER_DIM. Reverse the order because the main eyes looking at that table will look first for the business usage and second for the type.


If a table is in a Data Warehouse, call out whether it is a FACT or a DIM in the name. So if you have a data warehouse with a Star Schema, make that clear with how you name both the schema and the table names.


Here the Data Warehouse Schema is clearly identified as a Data Warehouse and each of the tables has a clear name. Both as a developer and an analytics user, I will be able to quickly identify the relationship in these tables without looking deeper, so I keep my cognitive load low.

SCHEMA

TABLE

DW

SALES_FACT

DW

CUSTOMER_DIM

DW

ITEM_DIM

Keeping a low cognitive load gives space for your brain to do the big thinking.


Keep Unit Tests Simple

The purpose of Unit Tests is to make it easy to determine if your code is working or not. Keep the unit test simple. If you are running a SQL statement, use something like


SELECT CURRENT_DATE()


Focus on structure and determining is the code works. Don't get bogged down in details.


Test first is fine - but it isn't a religion

Test-first is a concept from Extreme Programming and it does allow you to focus on the API. However, like most of what comes from XP, it borders on religious bullshit. Don't overdo test-first, and make sure your tests actually test the code.


You don't NEED to have a test for every single method. Tests have a lot of overhead. Only test the actual code, and think nothing of removing tests.


Mock objects are also highly overrated. Move the dependencies from your code and do actual tests. IE: Connect to the database and send the connection in to your code.

12 views0 comments

Recent Posts

See All

Downloading CMS Data is a bit tricky. The base site is here: https://data.cms.gov/provider-data/docs After beating my head against the wall, I discovered that the data key is embedded on the web page.