Michał Karzyński - Developing elegant workflows in Python code with Apache Airflow
"Developing elegant workflows in Python code with Apache Airflow
[EuroPython 2017 - Talk - 2017-07-13 - Anfiteatro 1]
[Rimini, Italy]
Every time a new batch of data comes in, you start a set of tasks. Some tasks can run in parallel, some must run in a sequence, perhaps on a number of different machines. That's a workflow.
Did you ever draw a block diagram of your workflow? Imagine you could bring that diagram to life and actually run it as it looks on the whiteboard. With Airflow you can just about do that.
http://airflow.apache.org
Apache Airflow is an open-source Python tool for orchestrating data processing pipelines. In each workflow tasks are arranged into a directed acyclic graph (DAG). Shape of this graph decides the overall logic of the workflow. A DAG can have many branches and you can decide which of them to follow and which to skip at execution time.
This creates a resilient design because each task can be retried multiple times if an error occurs. Airflow can even be stopped entirely and running workflows will resume by restarting the last unfinished task. Logs for each task are stored separately and are easily accessible through a friendly web UI.
In my talk I will go over basic Airflow concepts and through examples demonstrate how easy it is to define your own workflows in Python code. We'll also go over ways to extend Airflow by adding custom task operators, sensors and plugins.
License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/
Please see our speaker release agreement for details: https://ep2017.europython.eu/en/speaker-release-agreement/
Видео Michał Karzyński - Developing elegant workflows in Python code with Apache Airflow канала EuroPython Conference
[EuroPython 2017 - Talk - 2017-07-13 - Anfiteatro 1]
[Rimini, Italy]
Every time a new batch of data comes in, you start a set of tasks. Some tasks can run in parallel, some must run in a sequence, perhaps on a number of different machines. That's a workflow.
Did you ever draw a block diagram of your workflow? Imagine you could bring that diagram to life and actually run it as it looks on the whiteboard. With Airflow you can just about do that.
http://airflow.apache.org
Apache Airflow is an open-source Python tool for orchestrating data processing pipelines. In each workflow tasks are arranged into a directed acyclic graph (DAG). Shape of this graph decides the overall logic of the workflow. A DAG can have many branches and you can decide which of them to follow and which to skip at execution time.
This creates a resilient design because each task can be retried multiple times if an error occurs. Airflow can even be stopped entirely and running workflows will resume by restarting the last unfinished task. Logs for each task are stored separately and are easily accessible through a friendly web UI.
In my talk I will go over basic Airflow concepts and through examples demonstrate how easy it is to define your own workflows in Python code. We'll also go over ways to extend Airflow by adding custom task operators, sensors and plugins.
License: This video is licensed under the CC BY-NC-SA 3.0 license: https://creativecommons.org/licenses/by-nc-sa/3.0/
Please see our speaker release agreement for details: https://ep2017.europython.eu/en/speaker-release-agreement/
Видео Michał Karzyński - Developing elegant workflows in Python code with Apache Airflow канала EuroPython Conference
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![Airflow in Practice Stop Worrying Start Loving DAGs - Sarah Schattschneider](https://i.ytimg.com/vi/XD7euLOzKbs/default.jpg)
![Best practices with Airflow- an open source platform for workflows & schedules](https://i.ytimg.com/vi/dgaoqOZlvEA/default.jpg)
![Workflow Automation in Microservices Architectures: Camunda Day Amsterdam 2019](https://i.ytimg.com/vi/6bF_1w6GeEI/default.jpg)
![Argo Workflows in 5 minutes](https://i.ytimg.com/vi/TZgLkCFQ2tk/default.jpg)
![PyCon.DE 2017 Tamara Mendt - Modern ETL-ing with Python and Airflow (and Spark)](https://i.ytimg.com/vi/tcJhSaowzUI/default.jpg)
![](https://i.ytimg.com/vi/4WPvltZBZ9E/default.jpg)
![Matt Davis: A Practical Introduction to Airflow | PyData SF 2016](https://i.ytimg.com/vi/cHATHSB_450/default.jpg)
![Scalable Data Ingestion Architecture Using Airflow and Spark | Komodo Health](https://i.ytimg.com/vi/l764YAGPlIs/default.jpg)
![Making Workflows Simple with WorkflowEngine.io](https://i.ytimg.com/vi/vpMzm6TsdDY/default.jpg)
![Dmitry Figol - Optimizing Docker builds for Python applications](https://i.ytimg.com/vi/eRzMJuwuYpU/default.jpg)
![Elegant data pipelining with Apache Airflow - Bolke de Bruin](https://i.ytimg.com/vi/neuh_2_zrt8/default.jpg)
![3 Best Open Source Workflow Management Software & Free BPM Tools](https://i.ytimg.com/vi/Ks8dv37YU8A/default.jpg)
![Nicholas Schrock: Dagster - An open source Python library for building data applications at Crunch](https://i.ytimg.com/vi/ebIuXMU6lrM/default.jpg)
![Achieving Airflow Observability](https://i.ytimg.com/vi/Hc4pYAUL6Qs/default.jpg)
![Raymond Hettinger - Dataclasses: The code generator to end all code generators - PyCon 2018](https://i.ytimg.com/vi/T-TwcmT6Rcw/default.jpg)
![Airflow Breeze - Development and Test Environment for Apache Airflow](https://i.ytimg.com/vi/4MCTXq-oF68/default.jpg)
![Designing a Workflow Engine from First Principles](https://i.ytimg.com/vi/t524U9CixZ0/default.jpg)
![Keynote: How large companies use Airflow for ML and ETL pipelines](https://i.ytimg.com/vi/428AiCBMZoQ/default.jpg)
![Radoslav Georgiev - Django structure for scale and longevity](https://i.ytimg.com/vi/yG3ZdxBb1oo/default.jpg)
![Laura Lorenz | How I learned to time travel, or, data pipelining and scheduling with Airflow](https://i.ytimg.com/vi/60FUHEkcPyY/default.jpg)