Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Michael Armbrust
“As Apache Spark becomes more widely adopted, we have focused on creating higher-level APIs that provide increased opportunities for automatic optimization. In this talk, I give an overview of some of the exciting new API’s available in Spark 2.0, namely Datasets and Structured Streaming. Together, these APIs are bringing the power of Catalyst, Spark SQL's query optimizer, to all users of Spark. I'll focus on specific examples of how developers can build their analyses more quickly and efficiently simply by providing Spark with more information about what they are trying to accomplish.” - Michael
Slides: http://www.slideshare.net/databricks/structuring-spark-dataframes-datasets-and-streaming-62871797
Databricks Blog: "Deep Dive into Spark SQL’s Catalyst Optimizer"
https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html
// About the Presenter //
Michael Armbrust is the lead developer of the Spark SQL project at Databricks. He received his PhD from UC Berkeley in 2013, and was advised by Michael Franklin, David Patterson, and Armando Fox. His thesis focused on building systems that allow developers to rapidly build scalable interactive applications, and specifically defined the notion of scale independence. His interests broadly include distributed systems, large-scale structured storage and query optimization.
Follow Michael on -
Twitter: https://twitter.com/michaelarmbrust
LinkedIn: https://www.linkedin.com/in/michaelarmbrust
Видео Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Michael Armbrust канала Spark Summit
Slides: http://www.slideshare.net/databricks/structuring-spark-dataframes-datasets-and-streaming-62871797
Databricks Blog: "Deep Dive into Spark SQL’s Catalyst Optimizer"
https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html
// About the Presenter //
Michael Armbrust is the lead developer of the Spark SQL project at Databricks. He received his PhD from UC Berkeley in 2013, and was advised by Michael Franklin, David Patterson, and Armando Fox. His thesis focused on building systems that allow developers to rapidly build scalable interactive applications, and specifically defined the notion of scale independence. His interests broadly include distributed systems, large-scale structured storage and query optimization.
Follow Michael on -
Twitter: https://twitter.com/michaelarmbrust
LinkedIn: https://www.linkedin.com/in/michaelarmbrust
Видео Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Michael Armbrust канала Spark Summit
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![Top 5 Mistakes When Writing Spark Applications](https://i.ytimg.com/vi/vfiJQ7wg81Y/default.jpg)
![A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets - Jules Damji](https://i.ytimg.com/vi/Ofk7G3GD9jk/default.jpg)
![Building Robust ETL Pipelines with Apache Spark - Xiao Li](https://i.ytimg.com/vi/exWGf0aXJF4/default.jpg)
![The Parquet Format and Performance Optimization Opportunities Boudewijn Braams (Databricks)](https://i.ytimg.com/vi/1j8SdS7s_NY/default.jpg)
![](https://i.ytimg.com/vi/0JA8R7T6ikY/default.jpg)
![Intro to Apache Spark for Java and Scala Developers - Ted Malaska (Cloudera)](https://i.ytimg.com/vi/x8xXXqvhZq8/default.jpg)
![Announcing Delta Lake Open Source Project | Ali Ghodsi (Databricks), Michael Armbrust (Databricks)](https://i.ytimg.com/vi/5I5pqDsvGEc/default.jpg)
![Databricks: Create a Spark Table on top of an Azure SQL Table](https://i.ytimg.com/vi/ZaekPzCoYjs/default.jpg)
![KSQL Introduction | Level Up your KSQL by Confluent](https://i.ytimg.com/vi/C-rUyWmRJSQ/default.jpg)
![RDDs, DataFrames and Datasets in Apache Spark - NE Scala 2016](https://i.ytimg.com/vi/pZQsDloGB4w/default.jpg)
![Deep Dive into Project Tungsten Bringing Spark Closer to Bare Metal -Josh Rosen (Databricks)](https://i.ytimg.com/vi/5ajs8EIPWGI/default.jpg)
![What is Data Pipeline | How to design Data Pipeline ? - ETL vs Data pipeline](https://i.ytimg.com/vi/VtzvF17ysbc/default.jpg)
![Spark + Parquet In Depth: Spark Summit East talk by: Emily Curtin and Robbie Strickland](https://i.ytimg.com/vi/_0Wpwj_gvzg/default.jpg)
![Data Wrangling with PySpark for Data Scientists Who Know Pandas - Andrew Ray](https://i.ytimg.com/vi/XrpSRCwISdk/default.jpg)
![Optimizing Apache Spark SQL Joins: Spark Summit East talk by Vida Ha](https://i.ytimg.com/vi/fp53QhSfQcI/default.jpg)
![Broadcast joins in Apache Spark | Rock the JVM](https://i.ytimg.com/vi/af2k52NjcUo/default.jpg)
![Deep Learning and Streaming in Apache Spark 2 x - Matei Zaharia & Sue Ann Hong](https://i.ytimg.com/vi/zom9J9sK6wY/default.jpg)
![Working with Skewed Data: The Iterative Broadcast - Rob Keevil & Fokko Driesprong](https://i.ytimg.com/vi/6zg7NTw-kTQ/default.jpg)
![Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark | Databricks](https://i.ytimg.com/vi/wQfm4P23Hew/default.jpg)
![Pandas Limitations - Pandas vs Dask vs PySpark - DataMites Courses](https://i.ytimg.com/vi/YLg4vuIADnQ/default.jpg)