"Dynamic Data Pipelining with Luigi" - Trey Hakanson (Pyohio 2019)
Trey Hakanson
https://www.pyohio.org/2019/presentations/93
As the scale of modern data has grown, so too has the need for modern tooling to handle its growing list of needs. Databases have had to become more horizontally scalable, less centralized, and more fault tolerant to handle the expectations of modern users. As such, the concept of data-warehouses and data-engineering are relatively new concepts, and engineers are still hard at work to solve core problems of this new sector. One problem of particular interest is that of dynamic data pipelining and workflows. Ingesting large amounts of data, transforming streams dynamically into a standardized format, and maintaining checkpoints and dependencies in order to ensure that proper prerequisites are met before beginning a given task are all difficult problems. This talk will describe how these problems can be solved using Luigi, Spotify’s robust tool for constructing complex data pipelines and workflows.
Luigi allows for complex pipelines to be described programmatically, handling multiple dependencies and dependents. This allows it to be used for a wide variety of batch jobs, and the option to use the centralized scheduler makes it easy to monitor job progress across data warehouses. In addition, Luigi’s robust checkpoint system allows for pipelines to resumed at any point they may fail at. Each task is well-defined, specifying required inputs and resulting outputs, so creating or editing pipelines is a breeze.
As the scale of modern data has grown, so has the need for tooling to handle its growing list of challenges. Whether performing reporting, bulk ingestion, or ETL processes, it is important to maintain flexibility and ensure proper monitoring. Luigi provides a robust toolkit to perform a wide variety of data pipelining tasks, and can be easily integrated into existing workflows with ease.
===
https://pyohio.org
A FREE annual conference for anyone interested in Python in and around Ohio, the entire Midwest, maybe even the whole world.
Produced by NDV: https://youtube.com/channel/UCQ7dFBzZGlBvtU2hCecsBBg?sub_confirmation=1
Sun Jul 28 16:15:00 2019 at Cartoon 2
Видео "Dynamic Data Pipelining with Luigi" - Trey Hakanson (Pyohio 2019) канала PyOhio
https://www.pyohio.org/2019/presentations/93
As the scale of modern data has grown, so too has the need for modern tooling to handle its growing list of needs. Databases have had to become more horizontally scalable, less centralized, and more fault tolerant to handle the expectations of modern users. As such, the concept of data-warehouses and data-engineering are relatively new concepts, and engineers are still hard at work to solve core problems of this new sector. One problem of particular interest is that of dynamic data pipelining and workflows. Ingesting large amounts of data, transforming streams dynamically into a standardized format, and maintaining checkpoints and dependencies in order to ensure that proper prerequisites are met before beginning a given task are all difficult problems. This talk will describe how these problems can be solved using Luigi, Spotify’s robust tool for constructing complex data pipelines and workflows.
Luigi allows for complex pipelines to be described programmatically, handling multiple dependencies and dependents. This allows it to be used for a wide variety of batch jobs, and the option to use the centralized scheduler makes it easy to monitor job progress across data warehouses. In addition, Luigi’s robust checkpoint system allows for pipelines to resumed at any point they may fail at. Each task is well-defined, specifying required inputs and resulting outputs, so creating or editing pipelines is a breeze.
As the scale of modern data has grown, so has the need for tooling to handle its growing list of challenges. Whether performing reporting, bulk ingestion, or ETL processes, it is important to maintain flexibility and ensure proper monitoring. Luigi provides a robust toolkit to perform a wide variety of data pipelining tasks, and can be easily integrated into existing workflows with ease.
===
https://pyohio.org
A FREE annual conference for anyone interested in Python in and around Ohio, the entire Midwest, maybe even the whole world.
Produced by NDV: https://youtube.com/channel/UCQ7dFBzZGlBvtU2hCecsBBg?sub_confirmation=1
Sun Jul 28 16:15:00 2019 at Cartoon 2
Видео "Dynamic Data Pipelining with Luigi" - Trey Hakanson (Pyohio 2019) канала PyOhio
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![Pipelining & Hazards | Computer Organisation](https://i.ytimg.com/vi/R2dkj2v3lnU/default.jpg)
![Datastage: Extract: Sequential File Stage](https://i.ytimg.com/vi/iF01Cai4zkc/default.jpg)
![Hunter Owens | Building Your First Data Pipelines](https://i.ytimg.com/vi/TYtHzvys33A/default.jpg)
![Prefect Tutorial | Indestructible Python Code](https://i.ytimg.com/vi/0IcN117E4Xo/default.jpg)
![What is Data Pipeline | How to design Data Pipeline ? - ETL vs Data pipeline](https://i.ytimg.com/vi/VtzvF17ysbc/default.jpg)
![Conversion Models: Building Learning to Rank Training Data - Doug Turnbull, OpenSource Connections](https://i.ytimg.com/vi/33QDCpZmR-E/default.jpg)
![Python GENERATORS & using Yield keyword](https://i.ytimg.com/vi/Bzfu83LiEZs/default.jpg)
![](https://i.ytimg.com/vi/ZbtQfjX2U5A/default.jpg)
![Data Pipelines - Comparing Airflow and Luigi - Orr Shilon & Alex Levin - PyCon Israel 2019](https://i.ytimg.com/vi/-AVgLdHDtSg/default.jpg)
![Luigi and SQLAlchemy as a Replacement for ETL Tools](https://i.ytimg.com/vi/uNmqxSHaKIc/default.jpg)
![Elegant data pipelining with Apache Airflow - Bolke de Bruin](https://i.ytimg.com/vi/neuh_2_zrt8/default.jpg)
![Data pipelines from zero to solid](https://i.ytimg.com/vi/IVEl0bsTbdg/default.jpg)
![Jiaqi Liu - Building a Data Pipeline with Testing in Mind - PyCon 2018](https://i.ytimg.com/vi/fRUCAt9WFpc/default.jpg)
![Matt Davis: A Practical Introduction to Airflow | PyData SF 2016](https://i.ytimg.com/vi/cHATHSB_450/default.jpg)
![What is the difference between Database vs. Data lake vs. Warehouse?](https://i.ytimg.com/vi/E49BFhThC3U/default.jpg)
![Technology: The Incredible Journey of a Dredge through the Jungle](https://i.ytimg.com/vi/dxhISrEI1N8/default.jpg)
![Building and Managing a Centralized Kubeflow Platform at Spotify - Keshi Dai & Ryan Clough, Spotify](https://i.ytimg.com/vi/m9XhsnNSMAI/default.jpg)
![MLOps meetup #14 // Kubeflow vs MLflow with Byron Allen](https://i.ytimg.com/vi/TsGQZ0D3688/default.jpg)
![CppCon 2014: Chandler Carruth "Efficiency with Algorithms, Performance with Data Structures"](https://i.ytimg.com/vi/fHNmRkzxHWs/default.jpg)
![Will Smith's Lifestyle ★ 2019](https://i.ytimg.com/vi/6bvaVI8Y0CE/default.jpg)