Scaling Data Engineering Pipelines: Preparing Credit Card Transactions Data for Machine Learning
We discuss two real-world use cases in big data engineering, focusing on constructing stable pipelines and managing storage at a petabyte scale. The first use case highlights the implementation of Delta Lake to optimize data pipelines, resulting in an 80% reduction in query time and a 70% reduction in storage space. The second use case demonstrates the effectiveness of the Workflows ‘ForEach’ operator in executing compute-intensive pipelines across multiple clusters, significantly reducing processing time from months to days. This approach involves a reusable design pattern that isolates notebooks into units of work, enabling data scientists to independently test and develop.
Talk By: Brandon DeShon, Director, Data Scientist, Mastercard ; Luke Garzia, Lead Data Engineer, Mastercard
Here’s more to explore:
Production ready data pipelines for analytics and AI: https://www.databricks.com/solutions/data-engineering
The Big Book of Data Engineering: https://www.databricks.com/resources/ebook/big-book-data-engineering-2nd-edition
See all the product announcements from Data + AI Summit: https://www.databricks.com/events/dataaisummit-2025-announcements
Connect with us: Website: https://databricks.com
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc
Facebook: https://www.facebook.com/databricksinc
Видео Scaling Data Engineering Pipelines: Preparing Credit Card Transactions Data for Machine Learning канала Databricks
Talk By: Brandon DeShon, Director, Data Scientist, Mastercard ; Luke Garzia, Lead Data Engineer, Mastercard
Here’s more to explore:
Production ready data pipelines for analytics and AI: https://www.databricks.com/solutions/data-engineering
The Big Book of Data Engineering: https://www.databricks.com/resources/ebook/big-book-data-engineering-2nd-edition
See all the product announcements from Data + AI Summit: https://www.databricks.com/events/dataaisummit-2025-announcements
Connect with us: Website: https://databricks.com
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc
Facebook: https://www.facebook.com/databricksinc
Видео Scaling Data Engineering Pipelines: Preparing Credit Card Transactions Data for Machine Learning канала Databricks
Комментарии отсутствуют
Информация о видео
16 ч. 28 мин. назад
00:33:57
Другие видео канала