Загрузка страницы

How to Ensure that Your ETL Pipelines Deliver High Quality Data | Intermix

Get the slides: https://www.datacouncil.ai/talks/architecting-the-data-lake-how-to-ensure-that-your-etl-pipelines-deliver-high-quality-data

Download slides of this talk: https://www.dataengconf.com/speaker/architecting-the-data-lake-how-to-ensure-that-your-etl-pipelines-deliver-high-quality-data?utm_source=youtube&utm_medium=social&utm_campaign=%20-%20DEC-BCN-18%20Slides%20Download
ABOUT THE TALK:

The data in your data ​lake and data warehouse is mission critical. ​It informs crucial company decisions. ​Your reputation is on the line ​when data is not accurate. As the data engineer in charge, ​it is critical that you have confidence in your data pipeline execution. ​Complex ETL demands accuracy when data is mission critical. B​ut​ data pipelines are changing often. Regressions are possible from minor changes to DAGs and tasks. This may have unintended impacts on tables​ ​and data flow​s​ which may not be discovered until much later ​when data has been already ​shipped to its end users.

Also, failure and DAG outages occur. ​When you've fixed those failures, you need confidence ​that data is ‘flowing’ again and things are back to normal. ​In this talk, we'll cover examples of tests that can be run against your tables and data models. You will learn about the different classes of tests, how to set them up, and the important metrics to monitor. You'll be the enabler of accurate business decisions, with confidence in your data quality and no more guesswork or surprises about data quality.

ABOUT THE SPEAKER:

Paul is the co-founder & CTO of Intermix. He's experienced in building technology, product, and scaling organizations and establishing high-performing engineering cultures. With over 10 patents, Paul's prominent work includes bringing one of the first IaaS cloud computing service providers to market, developing a data analytics platform and mobile SDKs used by 1B end-users, and currently solving problems in big data.

ABOUT DATA COUNCIL:
Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.

FOLLOW DATA COUNCIL:
Twitter: https://twitter.com/DataCouncilAI
LinkedIn: https://www.linkedin.com/company/datacouncil-ai
Facebook: https://www.facebook.com/datacouncilai
Eventbrite: https://www.eventbrite.com/o/data-council-30357384520

Видео How to Ensure that Your ETL Pipelines Deliver High Quality Data | Intermix канала Data Council
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
11 октября 2018 г. 15:48:13
00:32:11
Яндекс.Метрика