From Idea to Model: Productionizing Data Pipelines with Apache Airflow
When supporting a data science team, data engineers are tasked with building a platform that keeps a wide range of stakeholders happy. Data scientists want rapid iteration, infrastructure engineers want monitoring and security controls, and product owners want their solutions deployed in time for quarterly reports. Collaboration between these stakeholders can be difficult, as every data science pipeline has a unique set of constraints and system requirements (compute resources, network connectivity, etc). For these reasons, data engineers strive to give their data scientists as much flexibility as possible, while maintaining an observable and resilient infrastructure. In recent years, Apache Airflow (a Python-based task orchestrator developed at Airbnb) has gained popularity as a collaborative platform between data scientists and infrastructure engineers looking to spare their users from verbose and rigid YAML files. Apache Airflow exposes a flexible pythonic interface that can be used as a collaboration point between data engineers and data scientists. Data engineers can build custom operators that abstract details of the underlying system and data scientists can use those operators (and many more) to build a diverse range of data pipelines. For this talk, we will take an idea from a single-machine notebook to a cross-service Spark + Tensorflow pipeline, to a canary tested, hyper-parameter-tuned, production-ready model served on Google Cloud Functions. We will show how Apache Airflow can connect all layers of a data team to deliver rapid results.
About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unifie...
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
Видео From Idea to Model: Productionizing Data Pipelines with Apache Airflow канала Databricks
About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unifie...
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
Видео From Idea to Model: Productionizing Data Pipelines with Apache Airflow канала Databricks
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Low-Code Apache SparkComcast makes home entertainment accessible to everyone with voice, data and AINBA Analytics | Data Brew | Season 4 Episode 2Data+AI Summit 2022 HighlightsAccelerating the Pace of Autism Diagnosis with Machine Learning ModelsMagnet Shuffle Service: Push-based Shuffle at LinkedInDemo Video: Connect to Power BI Desktop from DatabricksRay and Its Growing EcosystemGain 3 Benefits with Delta SharingPower to the (SQL) People: Python UDFs in DBSQLAutomating Data Quality Processes at ReckittLLM Module 3 - Multi-stage Reasoning | 3.7.3 Notebook Demo Part 3Modern Architecture of a Cloud-Enabled Data and Analytics PlatformLLM Module 2 - Embeddings, Vector Databases, and Search | 2.7 SummaryProtecting PII/PHI Data in Data Lake via Column Level EncryptionState-of-the-Art Natural Language Processing with Apache Spark NLPRun Your Queries Instantly in One of the Most Optimized EnvironmentsGrab leverages data + AI to create economic opportunities in Southeast AsiaMoving to the Lakehouse: Fast & Efficient Ingestion with Auto LoaderSpline: Central Data-Lineage Tracking, Not Only For SparkHow To Make Apache Spark on Kubernetes Run Reliably on Spot Instances