Все видео Новые видео Популярные видео Категории видео

Авто	Видео-блоги	ДТП, аварии	Для маленьких	Еда, напитки
Животные	Закон и право	Знаменитости	Игры	Искусство
Комедии	Красота, мода	Кулинария, рецепты	Люди	Мото
Музыка	Мультфильмы	Наука, технологии	Новости	Образование
Политика	Праздники	Приколы	Природа	Происшествия
Путешествия	Развлечения	Ржач	Семья	Сериалы
Спорт	Стиль жизни	ТВ передачи	Танцы	Технологии
Товары	Ужасы	Фильмы	Шоу-бизнес	Юмор

"Exploring Wikipedia With Apache Spark" - Advanced Training by Sameer Farooqui (Databricks)

Live Big Data Training from Spark Summit 2016 in San Francisco.

"The real power and value proposition of Apache Spark is in building a unified use case that combines ETL, batch analytics, real time stream analysis, machine learning, graph processing and visualizations. In class we will explore various Wikipedia datasets while applying the ideal programming paradigm for each analysis. The class will comprise of about 50% lecture and 50% hands-on labs + demos." - Sameer

Class covers:
- Spark SQL and DataFrames
- Spark Streaming
- Machine Learning (NLP, k-means clustering, TF-IDF, PageRank, Shortest Path)
- GraphFrames
- Visualizations (Databricks, Matplotlib, Google Charts, D3.js)
- Advanced Performance Tuning and Debugging
- Spark UI

Data sets that we explore:
- Pageviews (March 2015) - 255 MB
- Clickstream (Feb 2015) - 1.2 GB
- Pagecounts (last hour) - ~550 MB
- English Wikipedia (Mar 2016) - 54 GB
- 6 Wikipedia Language Live Edit Streams (variable)
// About the Presenter //
Sameer Farooqui is a Technology Evangelist at Databricks where he helps promote the adoption of Apache Spark. As a founding member of the training team, he created and taught advanced Spark classes at private clients, meetups and conferences globally.

Follow Sameer on -
Twitter: https://twitter.com/blueplastic
LinkedIn: https://www.linkedin.com/in/blueplastic

Видео "Exploring Wikipedia With Apache Spark" - Advanced Training by Sameer Farooqui (Databricks) канала Spark Summit

Показать

Комментарии отсутствуют