Загрузка страницы

Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark | Databricks

WANT TO EXPERIENCE A TALK LIKE THIS LIVE?
Barcelona: https://www.datacouncil.ai/barcelona
New York City: https://www.datacouncil.ai/new-york-city
San Francisco: https://www.datacouncil.ai/san-francisco
Singapore: https://www.datacouncil.ai/singapore
Download Slides: https://www.datacouncil.ai/talks/building-real-time-data-pipelines-made-easy-with-structured-streaming-in-apache-spark?utm_source=youtube&utm_medium=social&utm_campaign=%20-%20DEC-SF-18%20Slides%20Download

ABOUT THE TALK:

Structured Streaming is the next generation of distributed, streaming processing in Apache Spark. Developers can write a query written in their language of choice (Scala/Java/Python/R) using powerful high-level APIs (DataFrames / Datasets / SQL) and apply that same query to both static datasets and streaming data. In case of streaming, Spark will automatically create an incremental execution plan that automatically handles late, out-of-order data and ensures end-to-end exactly-once fault-tolerance guarantees.

In this practical session, I will walk through a concrete streaming ETL example where – in less than 10 lines – you can read raw, unstructured data from Kafka data, transform it and write it out as a structured table ready for batch and ad-hoc queries on up-to-the-last-minute data. I will give a quick glimpse of advanced features like event-time based aggregations, stream-stream joins and arbitrary stateful operations.

ABOUT THE SPEAKER:

Tathagata is a committer and PMC to the Apache Spark project and a Software Engineer at Databricks. He is the lead developer of Spark Streaming, and now focuses primarily on Structured Streaming. Previously, he was a member of the AMPLab, UC Berkeley as a graduate student researcher where he conducted research on data-center frameworks and networks with Scott Shenker and Ion Stoica.

FOLLOW DATA COUNCIL:
Twitter: https://twitter.com/DataCouncilAI
LinkedIn: https://www.linkedin.com/company/datacouncil-ai
Facebook: https://www.facebook.com/datacouncilai

Видео Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark | Databricks канала Data Council
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
31 мая 2018 г. 2:00:52
00:35:22
Яндекс.Метрика