From Basic to Advanced Aggregate Operators in Apache Spark SQL 2 2 by Examples and their Catalyst Op
"There are many different aggregate operators in Spark SQL. They range from the very basic groupBy and not so basic groupByKey that shines bright in Apache Spark Structured Streaming’s stateful aggregations, including the more advanced cube, rollup and pivot to my beloved windowed aggregations. It’s unbelievable how different the performance characteristic they have, even for the same use cases.
What is particularly interesting is the comparison of the simplicity and performance of windowed aggregations vs groupBy. And that’s just Spark SQL alone. Then there is Spark Structured Streaming that has put groupByKey operator at the forefront of stateful stream processing (and to my surprise as the performance might not be that satisfactory).
This deep-dive talk is going to show all the different use cases for the aggregate operators and functions as well as their performance differences in Spark SQL 2.2 and beyond. Code and fun included!
Session hashtag: #EUdd5"
About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unified-data-analytics-platform
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
Видео From Basic to Advanced Aggregate Operators in Apache Spark SQL 2 2 by Examples and their Catalyst Op канала Databricks
What is particularly interesting is the comparison of the simplicity and performance of windowed aggregations vs groupBy. And that’s just Spark SQL alone. Then there is Spark Structured Streaming that has put groupByKey operator at the forefront of stateful stream processing (and to my surprise as the performance might not be that satisfactory).
This deep-dive talk is going to show all the different use cases for the aggregate operators and functions as well as their performance differences in Spark SQL 2.2 and beyond. Code and fun included!
Session hashtag: #EUdd5"
About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Read more here: https://databricks.com/product/unified-data-analytics-platform
Connect with us:
Website: https://databricks.com
Facebook: https://www.facebook.com/databricksinc
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
Видео From Basic to Advanced Aggregate Operators in Apache Spark SQL 2 2 by Examples and their Catalyst Op канала Databricks
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![Spark Architecture in 3 minutes| Spark components | How spark works](https://i.ytimg.com/vi/OlYKyZvN2FA/default.jpg)
![User Defined Aggregation in Apache Spark: A Love Story](https://i.ytimg.com/vi/INKQCAgLUOo/default.jpg)
![Architecting for Data Quality in the Lakehouse with Delta Lake and PySpark](https://i.ytimg.com/vi/4K33fP46vDw/default.jpg)
![](https://i.ytimg.com/vi/Rou1WqyYpWw/default.jpg)
![Delta Live Tables Demo: Modern software engineering for ETL processing](https://i.ytimg.com/vi/BIxwoO65ylY/default.jpg)
![Demo Video: Connect to Power BI Desktop from Databricks](https://i.ytimg.com/vi/EcKqQV0rCnQ/default.jpg)
![Walgreens uses Databricks Lakehouse to personalize patient experiences & optimize their supply chain](https://i.ytimg.com/vi/l2rnu-6rEXU/default.jpg)
![The Data Lakehouse for Media & Entertainment](https://i.ytimg.com/vi/wQxune6-JOk/default.jpg)
![Databricks Workflows](https://i.ytimg.com/vi/H2FS4ijpFZA/default.jpg)
![Tutorial 5- Pyspark With Python-GroupBy And Aggregate Functions](https://i.ytimg.com/vi/u6I8HCJlIk0/default.jpg)
![Introduction to PySpark using AWS & Databricks](https://i.ytimg.com/vi/Ocdv0Z4rwTQ/default.jpg)
![Prescriptive Analytics](https://i.ytimg.com/vi/9QL4wRmqD3s/default.jpg)
![Databricks University Alliance | Community Lightning Talks](https://i.ytimg.com/vi/apWImqgdgU0/default.jpg)
![Real-Time Bidding Optimization with Databricks - Part 1](https://i.ytimg.com/vi/VSxS7vydtKE/default.jpg)
![Databricks Korea Lakehouse Day 2022](https://i.ytimg.com/vi/S4cIYj3P_co/default.jpg)
![Real-Time Bidding Optimization with Databricks - Part 2](https://i.ytimg.com/vi/CirPG9HLZXQ/default.jpg)
![Optimizing PySpark Code](https://i.ytimg.com/vi/U-e7hRpa4Bo/default.jpg)
![Demo Video: Prepare unstructured data for AI with Labelbox on Databricks](https://i.ytimg.com/vi/iiOHzY5Cws8/default.jpg)
![SPARK SUMMIT 2021](https://i.ytimg.com/vi/xChaGBs4VD8/default.jpg)
![LaLiga uses Databricks Lakehouse to enhance the fan and player experience](https://i.ytimg.com/vi/EtB71FLb5Y4/default.jpg)