Using Apache Spark 2.0 to Analyze the City of San Francisco's Open Data
Sameer Farooqui delivers a hands-on tutorial using Spark SQL and DataFrames to retrieve insights and visualizations from datasets published by the City of San Francisco. [Topics Indexed Below]
The labs are targeted for an audience with some general programming or SQL query experience, but little to no experience with Spark. Sameer will begin with some brief theory and lecture on Spark, before diving into several demos performing visualizations and analysis on calls made to the San Francsico Fire Department on July 4th.
Follow Along:
+ Databricks Community Edition: https://databricks.com/try
+ Labs: https://bit.ly/sfopenlabs
+ Learning Material: https://bit.ly/sfopenreadalong
-----Jump to Topic-----
00:00:06 - Workshop Intro & Environment Setup
00:13:06 - Brief Intro to Spark
00:17:32 - Analysis Overview: SF Fire Department Calls for Service
00:23:22 - Analysis with PySpark DataFrames API
00:29:32 - Doing Date/Time Analysis
00:47:53 - Memory, Caching and Writing to Parquet
01:00:40 - SQL Queries
01:21:11 - Convert a Spark DataFrame to a Pandas DataFrame
-----Q & A-----
01:24:43 - Spark DataFrames vs. SQL: Pros and Cons?
01:26:57 - Workflow for Chaining Databricks notebooks into Pipeline?
01:30:27 - Is Spark 2.0 ready to use in production?
----------------------------------------------------------------------------------------------
SPARK 2.0 TRAINING | NewCircle | Onsite & Public Classes
----------------------------------------------------------------------------------------------
+ Programming for Spark 2.0 (3 days)
+ Spark 2.0 for Machine Learning & Data Science (3 days)
Learn more: https://newcircle.com/category/apache-spark
++Code for San Francisco++
http://www.meetup.com/Code-for-San-Francisco-Civic-Hack-Night/
++Learn more about Databricks++
https://databricks.com/product/databricks
Видео Using Apache Spark 2.0 to Analyze the City of San Francisco's Open Data канала InfoQ
The labs are targeted for an audience with some general programming or SQL query experience, but little to no experience with Spark. Sameer will begin with some brief theory and lecture on Spark, before diving into several demos performing visualizations and analysis on calls made to the San Francsico Fire Department on July 4th.
Follow Along:
+ Databricks Community Edition: https://databricks.com/try
+ Labs: https://bit.ly/sfopenlabs
+ Learning Material: https://bit.ly/sfopenreadalong
-----Jump to Topic-----
00:00:06 - Workshop Intro & Environment Setup
00:13:06 - Brief Intro to Spark
00:17:32 - Analysis Overview: SF Fire Department Calls for Service
00:23:22 - Analysis with PySpark DataFrames API
00:29:32 - Doing Date/Time Analysis
00:47:53 - Memory, Caching and Writing to Parquet
01:00:40 - SQL Queries
01:21:11 - Convert a Spark DataFrame to a Pandas DataFrame
-----Q & A-----
01:24:43 - Spark DataFrames vs. SQL: Pros and Cons?
01:26:57 - Workflow for Chaining Databricks notebooks into Pipeline?
01:30:27 - Is Spark 2.0 ready to use in production?
----------------------------------------------------------------------------------------------
SPARK 2.0 TRAINING | NewCircle | Onsite & Public Classes
----------------------------------------------------------------------------------------------
+ Programming for Spark 2.0 (3 days)
+ Spark 2.0 for Machine Learning & Data Science (3 days)
Learn more: https://newcircle.com/category/apache-spark
++Code for San Francisco++
http://www.meetup.com/Code-for-San-Francisco-Civic-Hack-Night/
++Learn more about Databricks++
https://databricks.com/product/databricks
Видео Using Apache Spark 2.0 to Analyze the City of San Francisco's Open Data канала InfoQ
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie StricklandA Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets - Jules DamjiAdvanced Apache Spark Training - Sameer Farooqui (Databricks)Apache Spark / PySpark Tutorial: Basics In 15 MinsIntro to Apache Spark for Java and Scala Developers - Ted Malaska (Cloudera)Get Rid of Traditional ETL, Move to Spark! (Bas Geerdink)Understanding HDFS using LegosData Engineering Interview | Apache Spark Interview | Live Big Data InterviewPhysical Plans in Spark SQL - David Vrba (Socialbakers)Apache Spark Core—Deep Dive—Proper Optimization Daniel Tomes Databricks【4K】Walking down Fulton St. to Ocean Beach - San FranciscoIntro to Apache Spark Streaming | NewCircle TrainingReal-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark | DatabricksApache Spark Tutorial | Spark Tutorial for Beginners | Apache Spark Training | EdurekaPySpark TutorialSpark Tutorial For Beginners | Big Data Spark Tutorial | Apache Spark Tutorial | SimplilearnHow to Performance-Tune Apache Spark Applications in Large ClustersAzure Databricks: A Brief Introduction"Exploring Wikipedia With Apache Spark" - Advanced Training by Sameer Farooqui (Databricks)