Spline: Central Data-Lineage Tracking, Not Only For Spark
Data lineage tracking continues to be a major problem for many organizations. The variety of data tools and frameworks used in big companies’ and a lack of standards and universal lineage tracking solutions (especially open-source ones) makes it very difficult or sometimes even impossible to reliably track and visualize dataflows end to end. Spline is one of a very few open-source solutions available nowadays that tries to address that problem. Spline has started as a data-lineage tracking tool for Apache Spark. But now it offers a generic API and model that is capable to aggregate lineage metadata gathered from different data tools, wire it all together, providing a full end-to-end representation of how the data flows through the pipelines, and how it transforms along the way.
In this presentation we will explain how Spline can be used as a central data-lineage tracking tool for the organization. We’ll briefly cover the high-level architecture and design ideas, outline challenges and limitations of the current solution, and talk about deployment options. We’ll also talk about how Spline compares to some other open-source tools, and how OpenLineage standard can be leveraged to integrate with them.
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/data...
Instagram: https://www.instagram.com/databricksinc/
Видео Spline: Central Data-Lineage Tracking, Not Only For Spark автора PythonВерный путь
Видео Spline: Central Data-Lineage Tracking, Not Only For Spark автора PythonВерный путь
Информация
4 декабря 2023 г. 3:12:11
00:32:15
Похожие видео