Загрузка страницы

Spark Streaming as a Service with Kafka and YARN: Spark Summit East talk by Jim Dowling

Since April 2016, Spark-as-a-service has been available to researchers in Sweden from the Swedish ICT SICS Data Center at www.hops.site. Researchers work in an entirely UI-driven environment on a platform built with only open-source software.
Spark applications can be either deployed as jobs (batch or streaming) or written and run directly from Apache Zeppelin. Spark applications are run within a project on a YARN cluster with the novel property that Spark applications are metered and charged to projects. Projects are also securely isolated from each other and include support for project-specific Kafka topics. That is, Kafka topics are protected from access by users that are not members of the project. In this talk we will discuss the challenges in building multi-tenant Spark streaming applications on YARN that are metered and easy-to-debug. We show how we use the ELK stack (Elasticsearch, Logstash, and Kibana) for logging and debugging running Spark streaming applications, how we use Graphana and Graphite for monitoring Spark streaming applications, and how users can debug and optimize terminated Spark Streaming jobs using Dr Elephant. We will also discuss the experiences of our users (over 120 users as of Sept 2016): how they manage their Kafka topics and quotas, patterns for how users share topics between projects, and our novel solutions for helping researchers debug and optimize Spark applications.
To conclude, we will also give an overview on our course ID2223 on Large Scale Learning and Deep Learning, in which 60 students designed and ran SparkML applications on the platform.

Видео Spark Streaming as a Service with Kafka and YARN: Spark Summit East talk by Jim Dowling канала Spark Summit
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
14 февраля 2017 г. 21:10:56
00:31:29
Другие видео канала
Auto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by PhuDuc NguyenAuto Scaling Systems With Elastic Spark Streaming: Spark Summit East talk by PhuDuc NguyenBuilding Realtime Data Pipelines with Kafka Connect & Spark Streaming by Ewen Cheslack-PostavaBuilding Realtime Data Pipelines with Kafka Connect & Spark Streaming by Ewen Cheslack-PostavaReal-time big data processing with Spark Streaming- Tathagata Das (Databricks)Real-time big data processing with Spark Streaming- Tathagata Das (Databricks)Apache Spark Meet Up at Spark Summit East 2017Apache Spark Meet Up at Spark Summit East 2017The Fast Path to Building Operational Applications with Spark: talk by Nikita ShamgunovThe Fast Path to Building Operational Applications with Spark: talk by Nikita ShamgunovLessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by Tom PhelanLessons Learned from Dockerizing Spark Workloads: Spark Summit East talk by Tom PhelanBuilding Real Time BI Systems with Kafka, Spark & Kudu: Spark Summit East talk by Ruhollah FarchtchiBuilding Real Time BI Systems with Kafka, Spark & Kudu: Spark Summit East talk by Ruhollah FarchtchiUtilizing Spark as the Analytical Core to an Open Source HTAP Relational Database: John LeachUtilizing Spark as the Analytical Core to an Open Source HTAP Relational Database: John LeachTime Series Analytics with Spark: Spark Summit East talk by Simon OuelletteTime Series Analytics with Spark: Spark Summit East talk by Simon OuelletteSecured Kerberos based Spark Notebook for Data Science: Spark Summit East talk by Joy ChakrabortySecured Kerberos based Spark Notebook for Data Science: Spark Summit East talk by Joy ChakrabortyFinal Demo - Building Streaming Pipelines - Kafka Connect, Spark Structured Streaming and HBaseFinal Demo - Building Streaming Pipelines - Kafka Connect, Spark Structured Streaming and HBaseTeaching Apache Spark Clusters to Manage Their Workers Elastically: Erik Erlandson and Trevor MckayTeaching Apache Spark Clusters to Manage Their Workers Elastically: Erik Erlandson and Trevor MckayApache Toree: A Jupyter Kernel for Spark: Spark Summit East talk by Marius van NiekerkApache Toree: A Jupyter Kernel for Spark: Spark Summit East talk by Marius van NiekerkOptimizing Spark Deployments for Containers: Isolation, Safety & Performance by William BentonOptimizing Spark Deployments for Containers: Isolation, Safety & Performance by William BentonSpark Streaming: The State of the Union and the Road Beyond - Tathagata Das (Databricks)Spark Streaming: The State of the Union and the Road Beyond - Tathagata Das (Databricks)Building a Dataset Search Engine with Spark & Elasticsearch: talk by Oscar Castañeda-VillagránBuilding a Dataset Search Engine with Spark & Elasticsearch: talk by Oscar Castañeda-VillagránFlink Vs Spark | Difference between Flink & Spark - Apache Flink TutorialFlink Vs Spark | Difference between Flink & Spark - Apache Flink TutorialThe Dø - Despair, Hangover & EcstasyThe Dø - Despair, Hangover & EcstasyEffective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan LiEffective Spark with Alluxio: Spark Summit East talk by Gene Pang and Haoyuan Li
Яндекс.Метрика