Data Analytics for Software Engineers
Photo by Gabi Scott on Unsplash
Data Analytics is a strange beast in the world of software development. On one hand, it is highly hyped, brimming with powerful tools, and forms the "wow factor" of software products. On the other, it is not always understood, often lacks the rigor of software engineering, and incorrectly occupies a lower niche in the unspoken pecking order of developers.
Data Analytics is the end goal of any data pipeline. It relies heavily on the software engineering that makes the connections, downloads the data, processes it, and serves it, but it is where the buck stops. It is where the end-user finds value in the data, and for that reason alone, any developer on the stack of realizing the analytics should know basic analytical principles.
Refine the data to its finest level of relevant detail
Don't impress with analytics fireworks - express with simple clarity
Define effective metrics clearly and be consistent
In this article, I will use a much-underrated Business Analytics tool by the name of Metabase to illustrate these concepts with a public domain data product I maintain that utilizes the radio playlist data of the public Seattle radio station KEXP.
Use Basic Business Facts for Proper Granularity
Getting the proper granularity requires an understanding of the business. If you've never worked on a data pipeline before, I recommend The Data Warehouse Toolkit, 3rd Edition, by Kimball.
Read and understand star schemas and the concept of surrogate keys. More importantly, understand the common granularities for your problem domain.
The Kimball book outlines the facts that should be the foundation of the analytics endpoint. Each of the domains has a lowest common denominator that the analyst can use in the tool to find value in the data to drive business decisions.
Here are some common use cases
Retail and Supply Chain
Retail Sales - The Point of Sale quantity, amount, discounts, etc
Inventory - Store Quantity on Hand
Procurement - Procurement transaction quantity and amount
Claims - Billed amount, paid amount, insurance amount, etc
Playlist - Host, Artist, Song, and date
In this article, I focus on the last type to illustrate concepts in analytics using simple model.