Amundsen: A Data Discovery Platform From Lyft | Lyft
Download Slides: https://www.datacouncil.ai/talks/amundsen-a-data-discovery-platform-from-lyft
ABOUT THE TALK
In this talk, we discuss what a data discovery experience would look like in an ideal world and what Lyft has done to make that possible. We will introduce Amundsen which is an Open Source Data Discovery Platform From Lyft.
Amundsen is built on 3 key pillars:
1. Augmented Data Graph Amundsen uses a graph database under the hood to store relationships between various data assets (tables, dashboards, protobuf events, etc.). What's unique to Amundsen is that we treat people as a first class data asset – in other words, there's a graph node for each person in the organization that connects to other nodes (like tables, and dashboards).
2. Intuitive User Experience Amundsen runs PageRank using data from access logs to power search ranking, similar to how Google ranks web pages on the internet.
3. Centralized Metadata Amundsen gathers metadata from various different sources (Hive, Presto, Airflow, etc.) and exposes it in one central place. The right place to store all this metadata is a work in progress.
We will deep dive into Amundsen's architecture and discuss how it achieves the 3 discussed design pillars. We will close with future roadmap of the project, what problems remain unsolved and how we can work together to solve them.
ABOUT THE SPEAKERS
Tao Feng is a software engineer at Lyft data platform team working on various data products. Tao is a committer and PMC on Apache Airflow. Previously, Tao worked at LinkedIn and Oracle on data infrastructure, tooling and performance.
Jin Hyuk Chang is a software engineer at Lyft data platform team working on various data products. Jin is a main contributor to Apache Gobblin, and Azkaban. Previously, Jin worked at Linkedin and Amazon Web Service focused on Big data and Service oriented architecture.
Видео Amundsen: A Data Discovery Platform From Lyft | Lyft канала Data Council
ABOUT THE TALK
In this talk, we discuss what a data discovery experience would look like in an ideal world and what Lyft has done to make that possible. We will introduce Amundsen which is an Open Source Data Discovery Platform From Lyft.
Amundsen is built on 3 key pillars:
1. Augmented Data Graph Amundsen uses a graph database under the hood to store relationships between various data assets (tables, dashboards, protobuf events, etc.). What's unique to Amundsen is that we treat people as a first class data asset – in other words, there's a graph node for each person in the organization that connects to other nodes (like tables, and dashboards).
2. Intuitive User Experience Amundsen runs PageRank using data from access logs to power search ranking, similar to how Google ranks web pages on the internet.
3. Centralized Metadata Amundsen gathers metadata from various different sources (Hive, Presto, Airflow, etc.) and exposes it in one central place. The right place to store all this metadata is a work in progress.
We will deep dive into Amundsen's architecture and discuss how it achieves the 3 discussed design pillars. We will close with future roadmap of the project, what problems remain unsolved and how we can work together to solve them.
ABOUT THE SPEAKERS
Tao Feng is a software engineer at Lyft data platform team working on various data products. Tao is a committer and PMC on Apache Airflow. Previously, Tao worked at LinkedIn and Oracle on data infrastructure, tooling and performance.
Jin Hyuk Chang is a software engineer at Lyft data platform team working on various data products. Jin is a main contributor to Apache Gobblin, and Azkaban. Previously, Jin worked at Linkedin and Amazon Web Service focused on Big data and Service oriented architecture.
Видео Amundsen: A Data Discovery Platform From Lyft | Lyft канала Data Council
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Amundsen - From Discovering Data to Securing Data | Lyft / SquareDisrupting Data Discovery at Lyft (Amundsen)Functional Data Engineering - A Set of Best Practices | LyftDC_THURS on Data Reliability w/ Barr Moses (Monte Carlo)Scaling Instagram InfrastructureA Metadata Service for Data Abstraction, Data Lineage & Event-based Triggers | WeWorkThe history and anatomy of Apache SupersetReal-Time Delivery Architecture at TwitterCreating a Data Engineering Culture | Big Data InstituteRedis at Lyft: 1,000 InstancesData Science in 30 Minutes: Uber's Chief Scientist Explores Frontiers of Machine Learning and AISystem Design: Uber Lyft ride sharing services - Interview questionMartin Kleppmann | Kafka Summit SF 2018 Keynote (Is Kafka a Database?)Live from Lyft HQ: How Lyft Drives Data DiscoveryData Engineering Melbourne Meetup-MarsLan-Taming the Data beast using Data Hub-26th Nov 2020Lyft: Microservices & DiscoveryRunning Apache Airflow Reliably with Kubernetes | AstronomerUber vs. LyftAmundsen community meeting - 2020/01/23