Metaplane’s Post

Metaplane reposted this

View profile for Kevin Hu, PhD, graphic

Chief SQL Wrangler & CEO at Metaplane | MIT research | Data-informed posts about data

Data quality testing without monitoring pipelines is like running a kitchen without inspecting ingredients. No matter how skilled the chef, bad ingredients ruin the dish. #dataengineering #dataquality #analytics

  • No alternative text description for this image
Eric Gonzalez

Husband & Father | Data Executive | Creator | Advising Executives on Leveraging Data for Strategic Decisions | Bridging the Gap Between Boardrooms and Tech Teams

1mo

there's only one person who understands the ingestion pipeline and won't allow others to make changes because the associated tech debt acts as job security.

Merrill Albert

Enterprise Data Leader, Data Governance Officer, Data Thought Leader, Chief Data Officer, Fractional Governance, Data Evangelist, LinkedIn Top Data Governance Voice, creator of #CrimesAgainstData

1mo

That's far too clean. You're missing all the silos and the people trying to shoot holes in the infrastructure.

Yavin Owens

Enthusiastic Data Quality Expert, enabling organizations to optimize revenue with bespoke data quality tools

1mo

Also too many cooks spoil the broth

Ann Pickett

Analytics | Strategy | Operations

1mo

xkcd being accurate once again!

Ronan TREILLET

Data Architect | Certified Solution Architect AWS (SAA-C03) | Freelance

1mo

Yes checks within your pipeline as monitoring are important, but if you have data quality issues it not you pipeline’s problem. And associating pipelines to data quality often leads to mix both concepts. Yes you should test your ingredients but if you need to put some time (and money) somewhere it usually better to spend some effort to check your provider’s process rather than checking their product once they arrive in the kitchen even if, ideally, you need to do both. Cause if have to cook for thousands people and you realize all of you meat is in fact made of plastic it’s will be a little bit late to order new ingredients. The reality is that we often ask data engineers to solve issues they didn’t created, understand undocumented dynamics within data sources, force them to imagine complex solutions to work arounds terrible decisions people made just because it’s too late to change the provider’s process. Data quality is the responsibility of the data source, the same way people(hopefully) monitoring code quality in their application, they should monitor data quality of data they produce. Data quality is part of the product exposed to your colleagues or to your business partners. PO should be incentivized on this metric too

Satya Choudhury

Director, Personalization & AI/ML Architecture at Fidelity Investments

1mo

Not just it ruins the dish, it ruins the pan too. Applying certain data quality checks at source greatly reduces the impact on the dish as well as the pan

Fares Hasan

AI & Data Science Lead

1mo

The funny part is that most product and top management are oblivious to the detrimental effects of that ingestion pipeline. I would say it's sad also when they are taken off guard by how a small change in the data source schema blows the engine.

Like
Reply
Trygvi Zachariassen Laksafoss

Senior Manager AI&GenAI at Lundbeck

1mo

One huge python script pulling data from flat files and excel sheets

Constant demand from business to build dashboards and do analytics makes it difficult to invest more time in data quality and observability. Data quality and observability should have been more popular than what it is now.

Like
Reply
Sibbir Ahmmed Sihan

An experienced data engineer solved problems with Azure, Data Factory, Databricks, Azure DevOps, SQL, Airlfow, Git, PowerBI DAX, Python, Linux, AWS, Docker, dbt among other tools. Got an MSc in Data.

1mo

I don't like this visual representation.

See more comments

To view or add a comment, sign in

Explore topics