An Approach to Data Quality for Netflix Personalization Systems
Personalization is one of the key pillars of Netflix as it enables each member to experience the vast collection of content tailored to their interests. Our personalization system is powered by several machine learning models. These models are only as good as the data that is fed to them. They are trained using hundreds of terabytes of data everyday, that make it a non-trivial challenge to track and maintain data quality. To ensure high data quality, we require three things: automated monitoring of data; visualization to observe changes in the metrics over time; and mechanisms to control data related regressions, wherein a data regression is defined as data loss or distributional shifts over a given period of time.
In this talk, we will describe infrastructure and methods that we used to achieve the above: – ‘Swimlanes’ that help us define data boundaries for different environments that are used to develop, evaluate and deploy ML models, – Pipelines that aggregate data metrics from various sources within each swimlane – Time series and dashboard visualization tools across an atypically larger period of time – Automated audits that periodically monitor these metrics to detect data regressions. We will explain how we run aggregation jobs to optimize metric computations, SQL queries to quickly define/test individual metrics and other ETL jobs to power the visualization/audits tools using Spark.’
About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unifie...
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
Видео An Approach to Data Quality for Netflix Personalization Systems канала Databricks
In this talk, we will describe infrastructure and methods that we used to achieve the above: – ‘Swimlanes’ that help us define data boundaries for different environments that are used to develop, evaluate and deploy ML models, – Pipelines that aggregate data metrics from various sources within each swimlane – Time series and dashboard visualization tools across an atypically larger period of time – Automated audits that periodically monitor these metrics to detect data regressions. We will explain how we run aggregation jobs to optimize metric computations, SQL queries to quickly define/test individual metrics and other ETL jobs to power the visualization/audits tools using Spark.’
About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unifie...
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
Видео An Approach to Data Quality for Netflix Personalization Systems канала Databricks
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![Anomaly Detection for Data Quality and Metric Shifts at Netflix | Netflix](https://i.ytimg.com/vi/C3sbxtRe2Po/default.jpg)
![What is Azure Databricks?](https://i.ytimg.com/vi/5MC-RVfqnuY/default.jpg)
![](https://i.ytimg.com/vi/Rou1WqyYpWw/default.jpg)
![Netflix SQL Interview Question for Data Scientists and Data Analysts (StrataScratch 9751)](https://i.ytimg.com/vi/Xq6fLHx1qzg/default.jpg)
![Data Quality With or Without Apache Spark and Its Ecosystem](https://i.ytimg.com/vi/EQtaRqNUNd8/default.jpg)
![Technology Decisions At Scale Using Architecture Decision Records (ADRs) & Tech Radars](https://i.ytimg.com/vi/iy99k6z4wBg/default.jpg)
![How is data engineering different between Airbnb, Facebook, and Netflix?](https://i.ytimg.com/vi/JFRYfBmRJaI/default.jpg)
![Democratizing Data Quality Through a Centralized Platform](https://i.ytimg.com/vi/yJ8rECWTjMU/default.jpg)
![Announcing Delta Live Tables with Demo | Michael Armbrust | Keynote Data + AI Summit NA 2021](https://i.ytimg.com/vi/fJhlTsh34h4/default.jpg)
![How to Answer The Interview Question Why Netflix - Netflix Job Interview Question and Answer](https://i.ytimg.com/vi/ujxXPcIJ0Fc/default.jpg)
![Data Management - Why it's Essential](https://i.ytimg.com/vi/nkJHyrr7fLg/default.jpg)
![Test Automation Carnival-Ensuring Data quality with Deequ](https://i.ytimg.com/vi/DQjw7ndbwwE/default.jpg)
![Data modeling best practices - Part 1 - in Power BI and Analysis Services](https://i.ytimg.com/vi/kiVXI7zjSzY/default.jpg)
![Project Zen: Making Spark Pythonic | Reynold Xin | Keynote Data + AI Summit EU 2020](https://i.ytimg.com/vi/-vJLTEOdLvA/default.jpg)
![Realizing the Vision of the Data Lakehouse | Ali Ghodsi | Keynote Spark + AI Summit 2020](https://i.ytimg.com/vi/g11y-kJHr3I/default.jpg)
![Building a Feature Store around Dataframes and Apache Spark](https://i.ytimg.com/vi/uDyQqDCVjfA/default.jpg)
![*HARD* Netflix SQL Interview Question for Data Scientists and Data Analysts (StrataScratch 10303)](https://i.ytimg.com/vi/ts8mfwUSKFA/default.jpg)
![Machine Learning for Personalization](https://i.ytimg.com/vi/bebNv-wSqt4/default.jpg)
![Artwork Personalization at Netflix | Netflix](https://i.ytimg.com/vi/UjQMEjkrUGo/default.jpg)
![How to write a strong resume/CV (WITH A TEMPLATE) (Get Accepted to Your Dream University Part #10)](https://i.ytimg.com/vi/xaq8uDHcaHs/default.jpg)