Загрузка страницы

Matt Davis: A Practical Introduction to Airflow | PyData SF 2016

Matt Davis: A Practical Introduction to Airflow
PyData SF 2016

Airflow is a pipeline orchestration tool for Python that allows users to configure multi-system workflows that are executed in parallel across workers. I’ll cover the basics of Airflow so you can start your Airflow journey on the right foot. This talk aims to answer questions such as: What is Airflow useful for? How do I get started? What do I need to know that’s not in the docs?

Airflow is a popular pipeline orchestration tool for Python that allows users to configure complex (or simple!) multi-system workflows that are executed in parallel across any number of workers. A single pipeline might contain bash, Python, and SQL operations. With dependencies specified between tasks, Airflow knows which ones it can run in parallel and which ones must run after others. Airflow is written in Python and users can add their own operators with custom functionality, doing anything Python can do.

Moving data through transformations and from one place to another is a big part of data science/engineering, but there are only two widely-used orchestration systems for doing so that are written in Python: Luigi and Airflow. We’ve been using Airflow (http://pythonhosted.org/airflow/) for several months at Clover Health and have learned a lot about its strengths and weaknesses. We use it to run several pipelines multiple times per day. One includes over 450 heavily linked tasks!

www.pydata.org

PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.

PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.

Видео Matt Davis: A Practical Introduction to Airflow | PyData SF 2016 канала PyData
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
25 августа 2016 г. 0:46:47
00:45:52
Яндекс.Метрика