How to Ensure that Your ETL Pipelines Deliver High Quality Data | Intermix
Get the slides: https://www.datacouncil.ai/talks/architecting-the-data-lake-how-to-ensure-that-your-etl-pipelines-deliver-high-quality-data
Download slides of this talk: https://www.dataengconf.com/speaker/architecting-the-data-lake-how-to-ensure-that-your-etl-pipelines-deliver-high-quality-data?utm_source=youtube&utm_medium=social&utm_campaign=%20-%20DEC-BCN-18%20Slides%20Download
ABOUT THE TALK:
The data in your data lake and data warehouse is mission critical. It informs crucial company decisions. Your reputation is on the line when data is not accurate. As the data engineer in charge, it is critical that you have confidence in your data pipeline execution. Complex ETL demands accuracy when data is mission critical. But data pipelines are changing often. Regressions are possible from minor changes to DAGs and tasks. This may have unintended impacts on tables and data flows which may not be discovered until much later when data has been already shipped to its end users.
Also, failure and DAG outages occur. When you've fixed those failures, you need confidence that data is ‘flowing’ again and things are back to normal. In this talk, we'll cover examples of tests that can be run against your tables and data models. You will learn about the different classes of tests, how to set them up, and the important metrics to monitor. You'll be the enabler of accurate business decisions, with confidence in your data quality and no more guesswork or surprises about data quality.
ABOUT THE SPEAKER:
Paul is the co-founder & CTO of Intermix. He's experienced in building technology, product, and scaling organizations and establishing high-performing engineering cultures. With over 10 patents, Paul's prominent work includes bringing one of the first IaaS cloud computing service providers to market, developing a data analytics platform and mobile SDKs used by 1B end-users, and currently solving problems in big data.
ABOUT DATA COUNCIL:
Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.
FOLLOW DATA COUNCIL:
Twitter: https://twitter.com/DataCouncilAI
LinkedIn: https://www.linkedin.com/company/datacouncil-ai
Facebook: https://www.facebook.com/datacouncilai
Eventbrite: https://www.eventbrite.com/o/data-council-30357384520
Видео How to Ensure that Your ETL Pipelines Deliver High Quality Data | Intermix канала Data Council
Download slides of this talk: https://www.dataengconf.com/speaker/architecting-the-data-lake-how-to-ensure-that-your-etl-pipelines-deliver-high-quality-data?utm_source=youtube&utm_medium=social&utm_campaign=%20-%20DEC-BCN-18%20Slides%20Download
ABOUT THE TALK:
The data in your data lake and data warehouse is mission critical. It informs crucial company decisions. Your reputation is on the line when data is not accurate. As the data engineer in charge, it is critical that you have confidence in your data pipeline execution. Complex ETL demands accuracy when data is mission critical. But data pipelines are changing often. Regressions are possible from minor changes to DAGs and tasks. This may have unintended impacts on tables and data flows which may not be discovered until much later when data has been already shipped to its end users.
Also, failure and DAG outages occur. When you've fixed those failures, you need confidence that data is ‘flowing’ again and things are back to normal. In this talk, we'll cover examples of tests that can be run against your tables and data models. You will learn about the different classes of tests, how to set them up, and the important metrics to monitor. You'll be the enabler of accurate business decisions, with confidence in your data quality and no more guesswork or surprises about data quality.
ABOUT THE SPEAKER:
Paul is the co-founder & CTO of Intermix. He's experienced in building technology, product, and scaling organizations and establishing high-performing engineering cultures. With over 10 patents, Paul's prominent work includes bringing one of the first IaaS cloud computing service providers to market, developing a data analytics platform and mobile SDKs used by 1B end-users, and currently solving problems in big data.
ABOUT DATA COUNCIL:
Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.
FOLLOW DATA COUNCIL:
Twitter: https://twitter.com/DataCouncilAI
LinkedIn: https://www.linkedin.com/company/datacouncil-ai
Facebook: https://www.facebook.com/datacouncilai
Eventbrite: https://www.eventbrite.com/o/data-council-30357384520
Видео How to Ensure that Your ETL Pipelines Deliver High Quality Data | Intermix канала Data Council
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
What is Data Pipeline | How to design Data Pipeline ? - ETL vs Data pipelineData Quality With or Without Apache Spark and Its Ecosystem3 Best Practices for Data Organizations: Structure, ROI, Communications | Monte CarloArchitecting for Data Quality in the Lakehouse with Delta Lake and PySparkData modeling best practices - Part 1 - in Power BI and Analysis ServicesHow Netflix Handles Data Streams Up to 8M Events/secWhat are some common data pipeline design patterns? What is a DAG ? | ETL vs ELT vs CDC (2022)Testing and Documenting Your Data Doesn't Have to Suck | SuperconductiveScale and Optimize Data Engineering Pipelines with Best Practices: Modularity and Automated TestingAWS Tutorials - Data Quality Check in AWS Glue ETL PipelineBuilding Robust ETL Pipelines with Apache Spark - Xiao LiData Discovery Getting More From Your Metadata | Select StarDBT: Powerful, Open Source Data Transformations | Fishtown Analytics / DBTWhat Engineering Managers Should Do (and Why We Don’t) • Lena Reinhard • GOTO 2019How Technology is Changing the Stale Insurance Industry | Laura Drabik | TEDxSantaClaraUniversityUnlock the Power of Connected Vehicle Data with the AWS Connected Mobility SolutionAWS re:Invent 2018: How Robinhood Used AWS to Make a Self-Service Data Platform (STP08)Getting Started with AWS Glue ETL40 Real Data Architect Interview Questions & Answers - Part I