Dagster: A New Programming Model for Data Processing | Elementl
Download Slides: https://www.datacouncil.ai/talks/dagster-a-new-programming-model-for-data-processing
WANT TO EXPERIENCE A TALK LIKE THIS LIVE?
Barcelona: https://www.datacouncil.ai/barcelona
New York City: https://www.datacouncil.ai/new-york-city
San Francisco: https://www.datacouncil.ai/san-francisco
Singapore: https://www.datacouncil.ai/singapore
ABOUT THE TALK
This talk would introduce Dagster, an open source framework for building and modeling data processing computations. Data processing systems typically span multiple runtime, storage, tooling, and organizational boundaries. But all the stages in a data processing system share a fundamental property: They are directed, acyclic graphs (DAGs) of functional computations that consume and produce data assets. Dagster defines a standard for containerizing, describing and operating these computations, and that standard is opinionated and informed by the best practices in the industry leading to more testable, more reliable, better structured data systems.
By defining a standard one can build these computations in tools that users know and love such as Jupyter Notebooks (via Papermill), Dbt, Spark and leverage that standard in order to build high quality developer- and ops-facing tools to inspect, operate, and monitor those computations. These tools range from our beautiful introspection and execution tool Dagit, to tools that schedule these computations on systems ranging from Airflow to Lambda, among others. Dagster embraces the chaotic reality of the modern data management, and is an abstraction designed for incremental adoption within an increasingly heterogenous ecosystem. We would describe both the technology and the technical and organizational insights gained by production use of Dagster.
ABOUT THE SPEAKER
Nick Schrock is the founder and CEO of Elementl, a company aiming to reshape the data management ecosystem, and the creator of Dagster, a new programming model for data processing. Previously, Nick was a Principal Engineer and Director of Engineering at Facebook. In that time, Nick co-created GraphQL, and led its implementation and adoption across the entire organization and product line. He also formed the Product Infrastructure group, whose engineers, in addition to GraphQL, created React, React Native, and many other broadly-used developer technologies, both inside Facebook and the technology industry at large.
Видео Dagster: A New Programming Model for Data Processing | Elementl канала Data Council
WANT TO EXPERIENCE A TALK LIKE THIS LIVE?
Barcelona: https://www.datacouncil.ai/barcelona
New York City: https://www.datacouncil.ai/new-york-city
San Francisco: https://www.datacouncil.ai/san-francisco
Singapore: https://www.datacouncil.ai/singapore
ABOUT THE TALK
This talk would introduce Dagster, an open source framework for building and modeling data processing computations. Data processing systems typically span multiple runtime, storage, tooling, and organizational boundaries. But all the stages in a data processing system share a fundamental property: They are directed, acyclic graphs (DAGs) of functional computations that consume and produce data assets. Dagster defines a standard for containerizing, describing and operating these computations, and that standard is opinionated and informed by the best practices in the industry leading to more testable, more reliable, better structured data systems.
By defining a standard one can build these computations in tools that users know and love such as Jupyter Notebooks (via Papermill), Dbt, Spark and leverage that standard in order to build high quality developer- and ops-facing tools to inspect, operate, and monitor those computations. These tools range from our beautiful introspection and execution tool Dagit, to tools that schedule these computations on systems ranging from Airflow to Lambda, among others. Dagster embraces the chaotic reality of the modern data management, and is an abstraction designed for incremental adoption within an increasingly heterogenous ecosystem. We would describe both the technology and the technical and organizational insights gained by production use of Dagster.
ABOUT THE SPEAKER
Nick Schrock is the founder and CEO of Elementl, a company aiming to reshape the data management ecosystem, and the creator of Dagster, a new programming model for data processing. Previously, Nick was a Principal Engineer and Director of Engineering at Facebook. In that time, Nick co-created GraphQL, and led its implementation and adoption across the entire organization and product line. He also formed the Product Infrastructure group, whose engineers, in addition to GraphQL, created React, React Native, and many other broadly-used developer technologies, both inside Facebook and the technology industry at large.
Видео Dagster: A New Programming Model for Data Processing | Elementl канала Data Council
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Functional Data Engineering - A Set of Best Practices | LyftNicholas Schrock: Dagster - An open source Python library for building data applications at CrunchRunning Apache Airflow Reliably with Kubernetes | AstronomerScale By The Bay 2019: Nick Shrock, Dagster: a Framework for Data Processing Applications"Uncle" Bob Martin - "The Future of Programming"DBT: Powerful, Open Source Data Transformations | Fishtown Analytics / DBT"Probabilistic scripts for automating common-sense tasks" by Alexander LewData pipelines from zero to solidAnomaly Detection for Data Quality and Metric Shifts at Netflix | NetflixData Pipeline Frameworks: The Dream and the Reality | BeeswaxPyCon.DE 2017 Tamara Mendt - Modern ETL-ing with Python and Airflow (and Spark)Using Apache Arrow, Calcite and Parquet to build a Relational Cache | DremioCppCon 2014: Mike Acton "Data-Oriented Design and C++"Scalable Stream Processing: A Survey of Storm, Samza, Spark and Flink by Felix GessertHow I Became a Software Developer @ Facebook - Nick SchrockWhat is Data Processing in eDiscovery?Michał Karzyński - Developing elegant workflows in Python code with Apache AirflowTesting and Documenting Your Data Doesn't Have to Suck | SuperconductiveHow Superset and Druid Power Real-Time Analytics at Airbnb | DataEngConf SF '17Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark | Databricks