Data Lineage with Apache Airflow | Datakin
ABOUT THE TALK (https://www.datacouncil.ai/talks/data-lineage-with-apache-airflow)
With Airflow now ubiquitous for DAG orchestration, organizations increasingly depend on Airflow to manage complex inter-DAG dependencies and provide up-to-date runtime visibility into DAG execution. But what effects (if any) would upstream DAGs have on downstream DAGs if dataset consumption was delayed?
In this talk, we introduce Marquez: an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata. We will demonstrate how metadata management with Marquez helps maintain inter-DAG dependencies, catalog historical runs of DAGs, and minimize data quality issues.
ABOUT THE SPEAKER
Willy Lulciuc is a Software Engineer at Datakin. He makes datasets discoverable and meaningful with metadata. Previously, he worked on the Project Marquez team at WeWork. When he's not reviewing code and creating indirections, he can be found experimenting with analog synthesizers.
ABOUT DATA COUNCIL:
Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.
FOLLOW DATA COUNCIL:
Twitter: https://twitter.com/DataCouncilAI
LinkedIn: https://www.linkedin.com/company/datacouncil-ai
Facebook: https://www.facebook.com/datacouncilai
Eventbrite: https://www.eventbrite.com/o/data-council-30357384520
Видео Data Lineage with Apache Airflow | Datakin канала Data Council
With Airflow now ubiquitous for DAG orchestration, organizations increasingly depend on Airflow to manage complex inter-DAG dependencies and provide up-to-date runtime visibility into DAG execution. But what effects (if any) would upstream DAGs have on downstream DAGs if dataset consumption was delayed?
In this talk, we introduce Marquez: an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata. We will demonstrate how metadata management with Marquez helps maintain inter-DAG dependencies, catalog historical runs of DAGs, and minimize data quality issues.
ABOUT THE SPEAKER
Willy Lulciuc is a Software Engineer at Datakin. He makes datasets discoverable and meaningful with metadata. Previously, he worked on the Project Marquez team at WeWork. When he's not reviewing code and creating indirections, he can be found experimenting with analog synthesizers.
ABOUT DATA COUNCIL:
Data Council (https://www.datacouncil.ai/) is a community and conference series that provides data professionals with the learning and networking opportunities they need to grow their careers. Make sure to subscribe to our channel for more videos, including DC_THURS, our series of live online interviews with leading data professionals from top open source projects and startups.
FOLLOW DATA COUNCIL:
Twitter: https://twitter.com/DataCouncilAI
LinkedIn: https://www.linkedin.com/company/datacouncil-ai
Facebook: https://www.facebook.com/datacouncilai
Eventbrite: https://www.eventbrite.com/o/data-council-30357384520
Видео Data Lineage with Apache Airflow | Datakin канала Data Council
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
A Metadata Service for Data Abstraction, Data Lineage & Event-based Triggers | WeWorkBuilding (Better) Data Pipelines with Apache AirflowApache Iceberg - A Table Format for Huge Analytic DatasetsНачало работы с apache airflow - "Школы Больших Данных" г. МоскваScalable Data Ingestion Architecture Using Airflow and Spark | Komodo HealthReal-Time Data Lineage at UBS — Wren Chan and Sidharth Goyal, UBSSpark and Iceberg at Apple's Scale - Leveraging differential files for efficient upserts and deletesTowards Enterprise - Grade Data Discovery at ING with Apache Atlas and Amundsen by Verdan MahmoodDataHub Lineage: Airflow, Superset Demo at Community TownHall Apr 23 2021The Secret to Your Metadata Management Success is Automated Data LineageAWS re:Invent 2020: Building real-time applications using Apache FlinkDagster: Workflows for Data Science, Machine Learning, and Data Engineering | ElementlAnomaly Detection for Data Quality and Metric Shifts at Netflix | NetflixUnderstanding Data LineageHow to Do Data Profiling in Excel? #dataprofilingKeynote: How large companies use Airflow for ML and ETL pipelinesData Stewardship: A Matter of Life and Death | Dr. Elisabeth I. Heath | TEDxWayneStateULearning Apache Airflow with Python in easy way in 40 MinutesDataXDay - How to use Apache Kafka to transform a batch pipeline into a Real-Time-One?Simplifying DataHub Deployment: June 25 Community Townhall