Big Data Systems Development is filled with complexity, and more often than not, interesting and elegant shortcuts.
It is such a vast set of tools and literature, some of which is not readily accessible, that it helps to do some studying and reading.
This podcast describes Slack's Big Data Systems.
Transcript here: https://softwareengineeringdaily.com/wp-content/uploads/2020/01/SED980-Josh-Wills-Slack-Search.pdf
Cool Ideas:
The Law of Leaky Abstractions is an important basic principle
JSON has no built-in governance and that will create work in normalizing data
Choose Boring Technology
The ideas are well-known and you can get help
It's proven to work
Slack is a Product-Driven Company
"JW: Slack is by far the most product-oriented company I have ever worked at"
Snowflake does a great job of abstracting the ETL SQL and the Analytics SQL
Data Ingestion is still a lot of work
The Five Forces Framework is a method for analyzing the competition of businesses
SQL is boring technology, but so simple that it is vast
SQL was invented in 1974 by a team at IBM, and yet it is the most widely-used language today. Every modern language and IDE provides strong support for SQL and it remains the most important tool in any developer's toolkit.
SQL is so powerful because you don't need to be a developer to use it. In fact, an entire job class of "Analyst" revolves around using SQL to get business insights from data.
The secret sauce of SQL is that it operates on sets. That means it can abstract processing to the backend hardware and application and perform massive amounts of data processing in parallel. That is the primary reason why using a transformation to process data is such an important idea. The transformation can take raw data and cleanse it into a set that makes more sense. That set can be transformed again, and so on. Sets can be sliced and diced in a visual tool to understand relationships. It is so simple that it is vast.
Comments