Загрузка страницы

Building a Large-scale Transactional Data Lake Using Apache Hudi

Data is a critical infrastructure for building machine learning systems. From ensuring accurate ETAs to predicting optimal traffic routes, providing safe, seamless transportation and delivery experiences on the Uber platform requires reliable, performant large-scale data storage and analysis. In 2016, Uber developed Apache Hudi, an incremental processing framework, to power business critical data pipelines at low latency and high efficiency, and helps distributed organizations build and manage petabyte-scale data lakes.
In this talk, I will describe what is APache Hudi and its architectural design, and then deep dive to improving data operations by providing features such as data versioning, time travel.
We will also go over how Hudi brings kappa architecture to big data systems and enables efficient incremental processing for near real time use cases.

Speaker: Satish Kotha (Uber)
Apache Hudi committer and Engineer at Uber. Previously, he worked on building real time distributed storage systems like Twitter MetricsDB and BlobStore.

Resources:
https://www.aicamp.ai/event/eventdetails/W2021043010
Slides: https://www.slideshare.net/BillLiu31/building-large-scale-transactional-data-lake-using-apache-hudi

Видео Building a Large-scale Transactional Data Lake Using Apache Hudi канала AICamp
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
30 апреля 2021 г. 23:41:24
00:49:54
Яндекс.Метрика