Загрузка страницы

"Exploring Wikipedia With Apache Spark" - Advanced Training by Sameer Farooqui (Databricks)

Live Big Data Training from Spark Summit 2016 in San Francisco.

"The real power and value proposition of Apache Spark is in building a unified use case that combines ETL, batch analytics, real­ time stream analysis, machine learning, graph processing and visualizations. In class we will explore various Wikipedia datasets while applying the ideal programming paradigm for each analysis. The class will comprise of about 50% lecture and 50% hands-on labs + demos." - Sameer

Class covers:
- Spark SQL and DataFrames
- Spark Streaming
- Machine Learning (NLP, k-means clustering, TF-IDF, PageRank, Shortest Path)
- GraphFrames
- Visualizations (Databricks, Matplotlib, Google Charts, D3.js)
- Advanced Performance Tuning and Debugging
- Spark UI

Data sets that we explore:
- Pageviews (March 2015) - 255 MB
- Clickstream (Feb 2015) - 1.2 GB
- Pagecounts (last hour) - ~550 MB
- English Wikipedia (Mar 2016) - 54 GB
- 6 Wikipedia Language Live Edit Streams (variable)
// About the Presenter //
Sameer Farooqui is a Technology Evangelist at Databricks where he helps promote the adoption of Apache Spark. As a founding member of the training team, he created and taught advanced Spark classes at private clients, meetups and conferences globally.

Follow Sameer on -
Twitter: https://twitter.com/blueplastic
LinkedIn: https://www.linkedin.com/in/blueplastic

Видео "Exploring Wikipedia With Apache Spark" - Advanced Training by Sameer Farooqui (Databricks) канала Spark Summit
Показать
Комментарии отсутствуют
Введите заголовок:

Введите адрес ссылки:

Введите адрес видео с YouTube:

Зарегистрируйтесь или войдите с
Информация о видео
16 июня 2016 г. 5:39:26
02:37:24
Другие видео канала
Glint: An Asynchronous Parameter Server for Spark (Rolf Jagerman)Glint: An Asynchronous Parameter Server for Spark (Rolf Jagerman)Extreme scale Ad Tech using Spark and Databricks at MediaMath (Prasad Chalasani)Extreme scale Ad Tech using Spark and Databricks at MediaMath (Prasad Chalasani)IoT and the Autonomous Vehicle in the Clouds: Spark Summit East  talk by Jay White BearIoT and the Autonomous Vehicle in the Clouds: Spark Summit East talk by Jay White BearAnalysis Andromeda Galaxy Data Using Spark: Spark Summit East talk by Jose NandezAnalysis Andromeda Galaxy Data Using Spark: Spark Summit East talk by Jose NandezDeep Recurrent Neural Networks for Sequence Learning in SparkDeep Recurrent Neural Networks for Sequence Learning in SparkThe Fast Path to Building Operational Applications with Spark: talk by Nikita ShamgunovThe Fast Path to Building Operational Applications with Spark: talk by Nikita ShamgunovExtending Spark with Java Agents (Jaroslav Bachorik)Extending Spark with Java Agents (Jaroslav Bachorik)Scaling Genetic Data Analysis with Apache Spark: Spark Summit East talk by Cotton SeedScaling Genetic Data Analysis with Apache Spark: Spark Summit East talk by Cotton SeedNew Directions for Spark in 2015- Matei Zaharia (Databricks)New Directions for Spark in 2015- Matei Zaharia (Databricks)Software Above the Level of a Single Device  The Implications  - Tim O'Reilly (O'Reilly Media)Software Above the Level of a Single Device The Implications - Tim O'Reilly (O'Reilly Media)Keynote - Arun Murthy (Hortonworks)Keynote - Arun Murthy (Hortonworks)Extending Word2Vec for Performance and Semi Supervised Learning - Michael Malak (Oracle)Extending Word2Vec for Performance and Semi Supervised Learning - Michael Malak (Oracle)5 Reasons Enterprise Adoption Of Spark Is Unstoppable5 Reasons Enterprise Adoption Of Spark Is UnstoppableSpark Summit 2013 - Big Data Research in the AMPLab - Mike FranklinSpark Summit 2013 - Big Data Research in the AMPLab - Mike FranklinDelivering Insights from 5PB of Product Logs at Pure Storage: Spark Summit East talk by Brian GoldDelivering Insights from 5PB of Product Logs at Pure Storage: Spark Summit East talk by Brian GoldPedal to the Metal: Accelerating Apache Spark with Innovations in Silicon TechnologyPedal to the Metal: Accelerating Apache Spark with Innovations in Silicon TechnologyHow to Integrate MLlib and Solr to Build Real Time Recognition System by Khalifeh AlJaddaHow to Integrate MLlib and Solr to Build Real Time Recognition System by Khalifeh AlJaddaSpark Plugs Into Your Car- Arpan Ghosh; Rob Ferguson (Automatic)Spark Plugs Into Your Car- Arpan Ghosh; Rob Ferguson (Automatic)Production Spark and Tachyon use CasesProduction Spark and Tachyon use CasesSpark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applications - Kelvin Chu (Uber)Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applications - Kelvin Chu (Uber)Perspectives on Big Data & Analytics - Doug Wolfe (Central Intelligence Agency)Perspectives on Big Data & Analytics - Doug Wolfe (Central Intelligence Agency)
Яндекс.Метрика