top of page
  • Writer's pictureTim Burns

Machine Learning will change everything

Updated: May 16

Our world is filled with layers of information - waiting to be comprehended (Author)

I have been researching using Machine Learning to build domain models, quality controls, and natural language analysis around data pipelines. The deeper I delve into ML, the more convinced I am that it will radically change how we build data pipelines, even more than the recent transition from on-prem solutions to cloud-based solutions.

Business data needs to grow faster than many data engineering teams can keep up. Data engineers have many tools: Snowflake, AWS, Terraform, and dbt. However, orchestrating meaning and action in the data pipeline remains a persistent problem. Natural language processing engines like OpenAI offer an automated mechanism to connect components logically without human intervention. As a result, analysts and engineers can supervise the process of turning data into value, make connections extending beyond siloed domains, and ultimately build better data products.

Quote from C. Samiulla's reference on testing and monitoring.

In this way testing & monitoring are like battle armor. Too little and you are vulnerable. Too much, and you can barely move.


  1. D. Scully et al. (2015) Hidden Technical Debt in Machine Learning Systems

  2. E. Samuylova (2020) Machine Learning in Production: Why You Should Care About Data and Concept Drift

  3. C. Samiulla (2020) Monitoring Machine Learning Models in Production

  4. D. Sato et al (2019) Continuous Delivery for Machine Learning

  5. E. Breck et al (2017) The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction

  6. S. Amershi et al (2019) Software Engineering for Machine Learning: A Case Study

  7. D. Le et al (2020) Baselines

  8. B. Nushi (2021) Responsible Machine Learning with Error Analysis

  9. J. Czakon (2023) ML Experiment Tracking: What It Is, Why It Matters, and How to Implement It

  10. B. Gao et al (2017) Deep Label Distribution Learning with Label Ambiguity

  11. A. Ng (2023) Building a data pipeline

  12. B. Mathes et al (2021) ML Metadata: Version Control for ML

  13. C. Wiley (2020) Key requirements for an MLOps foundation

15 views0 comments

Recent Posts

See All

Aspects of Domain Data The components of Domain Data as a Product are as follows [1]. Discoverable Addressable Reliability Self-Describing Interoperable Secure Data Lineage Data lineage is fundamental

bottom of page