Загрузка страницы

Keeping Spark on Track: Productionizing Spark for ETL: talk by Kyle Pistor and Miklos Christine

ETL is the first phase when building a big data processing platform. Data is available from various sources and formats, and transforming the data into a compact binary format (Parquet, ORC, etc.) allows Apache Spark to process it in the most efficient manner. In this talk, we will discuss common issues and best practices for speeding up your ETL workflows, handling dirty data, and debugging tips for identifying errors.

Видео Keeping Spark on Track: Productionizing Spark for ETL: talk by Kyle Pistor and Miklos Christine канала Spark Summit
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
14 февраля 2017 г. 21:23:16
00:31:40
Яндекс.Метрика