Building a Large-scale Transactional Data Lake Using Apache Hudi
Data is a critical infrastructure for building machine learning systems. From ensuring accurate ETAs to predicting optimal traffic routes, providing safe, seamless transportation and delivery experiences on the Uber platform requires reliable, performant large-scale data storage and analysis. In 2016, Uber developed Apache Hudi, an incremental processing framework, to power business critical data pipelines at low latency and high efficiency, and helps distributed organizations build and manage petabyte-scale data lakes.
In this talk, I will describe what is APache Hudi and its architectural design, and then deep dive to improving data operations by providing features such as data versioning, time travel.
We will also go over how Hudi brings kappa architecture to big data systems and enables efficient incremental processing for near real time use cases.
Speaker: Satish Kotha (Uber)
Apache Hudi committer and Engineer at Uber. Previously, he worked on building real time distributed storage systems like Twitter MetricsDB and BlobStore.
Resources:
https://www.aicamp.ai/event/eventdetails/W2021043010
Slides: https://www.slideshare.net/BillLiu31/building-large-scale-transactional-data-lake-using-apache-hudi
Видео Building a Large-scale Transactional Data Lake Using Apache Hudi канала AICamp
In this talk, I will describe what is APache Hudi and its architectural design, and then deep dive to improving data operations by providing features such as data versioning, time travel.
We will also go over how Hudi brings kappa architecture to big data systems and enables efficient incremental processing for near real time use cases.
Speaker: Satish Kotha (Uber)
Apache Hudi committer and Engineer at Uber. Previously, he worked on building real time distributed storage systems like Twitter MetricsDB and BlobStore.
Resources:
https://www.aicamp.ai/event/eventdetails/W2021043010
Slides: https://www.slideshare.net/BillLiu31/building-large-scale-transactional-data-lake-using-apache-hudi
Видео Building a Large-scale Transactional Data Lake Using Apache Hudi канала AICamp
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
The Great Stagnation in MLFeature Stores: Core Concepts, Practices and Workshop (with Feast and Kubeflow)Workshop: Enterprise Production AI with GCP AutoMLProject Nessie: A git-like experience for Data LakesThe Bayesians are Coming to Time SeriesPresto and Apache HudiApache Hudi - Design/Code Walkthrough Session for ContributorsApache Iceberg - A Table Format for Huge Analytic DatasetsPowering Uber's global network analytics pipelines in real-time with Apache Hudi | UberAWS re:Invent 2019: Insert, upsert, and delete data in Amazon S3 using Amazon EMR (ANT239)Speeding up Presto Queries Using Apache Hudi Clustering - Satish Kotha & Nishith Agarwal, UberWhat Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial | Simplilearn#bbuzz: Nishith Agarwal - Building large scale, transactional data lakes using Apache HudiMonitoring ML Models in ProductionHudi: Large Scale, Near Real Time Pipelines at Uber by Nishith Agarwal Vinoth Chandar (Uber)Practical Approaches for Efficient Hyperparameter OptimizationNext Gen Data Lakes using Apache HudiWhat is the difference between Database vs. Data lake vs. Warehouse?Automating Model Monitoring and Drift DetectionQuery Apache Hudi Datasets using Amazon Athena