Tutorial: How Delta Lake Supercharges Data Lakes
Delta Lake’s transaction log brings high reliability, performance, and ACID compliant transactions to data lakes. But exactly how does it accomplish this?
Working through concrete examples, we will take a close look at how the transaction logs are managed and leveraged by Delta to supercharge data lakes.
In this tech talk you will learn:
- Enabling and configuring OSS Delta Lake
- Creating Delta Lake tables
- Using history() to view metadata and table versioning
- How Delta manages the log files
- What goes into the transaction logs for various DML operations
- How Delta constructs snapshots of data
- The small file problem and how to mitigate it
- How to construct time travel queries
- Configuring Delta tables for deleted files and log retention
Speaker: Louis Frolio is a Senior Technical Instructor at Databricks. Leveraging his successful career in Data and AI, Louis trains Databricks business partners on Databricks and Spark. He has two Master Degrees, one in Applied Physics from the University of Massachusetts and a second in Strategic Analytics from Brandeis University. Louis lives in New England with his wife and son. As a former professional chef, Louis still considers himself a culinarian and uses his personal time to explore the world of food.
The notebooks for this video can be found at: https://github.com/databricks/tech-talks/tree/master/2020-08-27%20%7C%20How%20Delta%20Lake%20Supercharges%20Data%20Lakes Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
Видео Tutorial: How Delta Lake Supercharges Data Lakes канала Databricks
Working through concrete examples, we will take a close look at how the transaction logs are managed and leveraged by Delta to supercharge data lakes.
In this tech talk you will learn:
- Enabling and configuring OSS Delta Lake
- Creating Delta Lake tables
- Using history() to view metadata and table versioning
- How Delta manages the log files
- What goes into the transaction logs for various DML operations
- How Delta constructs snapshots of data
- The small file problem and how to mitigate it
- How to construct time travel queries
- Configuring Delta tables for deleted files and log retention
Speaker: Louis Frolio is a Senior Technical Instructor at Databricks. Leveraging his successful career in Data and AI, Louis trains Databricks business partners on Databricks and Spark. He has two Master Degrees, one in Applied Physics from the University of Massachusetts and a second in Strategic Analytics from Brandeis University. Louis lives in New England with his wife and son. As a former professional chef, Louis still considers himself a culinarian and uses his personal time to explore the world of food.
The notebooks for this video can be found at: https://github.com/databricks/tech-talks/tree/master/2020-08-27%20%7C%20How%20Delta%20Lake%20Supercharges%20Data%20Lakes Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms. Download the reports here. https://databricks.com/databricks-named-leader-by-gartner
Видео Tutorial: How Delta Lake Supercharges Data Lakes канала Databricks
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![Delta Lake for apache Spark | How does it work | How to use delta lake | Delta Lake for Spark ACID](https://i.ytimg.com/vi/xYtU6fpsS3M/default.jpg)
![](https://i.ytimg.com/vi/Rou1WqyYpWw/default.jpg)
![Generating Surrogate Keys for your Data Lakehouse with Spark SQL and Delta Lake](https://i.ytimg.com/vi/aF2hRH5WZAU/default.jpg)
![Azure Event Hub Tutorial & Arch. | Event Hub Deep Dive Understanding & creation - CLI/Azure Portal](https://i.ytimg.com/vi/gptvRmgwJDI/default.jpg)
![Delta Lake with Azure Databricks - Let's build a reliable Data Lake! by Mohit Batra](https://i.ytimg.com/vi/i9Avb7fIfDc/default.jpg)
![How to Query AWS Athena from a Lambda Function | Step by Step Tutorial](https://i.ytimg.com/vi/a_Og1t3ULOI/default.jpg)
![How to Build a Cloud Data Platform Part 1- Architecture](https://i.ytimg.com/vi/uhVpLwjEOKU/default.jpg)
![Designing ETL Pipelines with Structured Streaming and Delta Lake— How to Architect Things Right](https://i.ytimg.com/vi/eOhAzjf__iQ/default.jpg)
![Azure Data Factory - Lookup and If Condition Activities (Part 3)](https://i.ytimg.com/vi/y0ZLF0vaaQo/default.jpg)
![Making Apache Spark™ Better with Delta Lake](https://i.ytimg.com/vi/LJtShrQqYZY/default.jpg)
![Tech Talk | Diving into Delta Lake Part 1: Unpacking the Transaction Log](https://i.ytimg.com/vi/F91G4RoA8is/default.jpg)
![Kinesis Data Streams to AWS Lambda Example | Kinesis Lambda Consumer | AWS Lambda with Java Runtime](https://i.ytimg.com/vi/G9nSwSd64RU/default.jpg)
![Delta Lake on Databricks Demo](https://i.ytimg.com/vi/BMO90DI82Dc/default.jpg)
![SQL Analytics and the Lakehouse Architecture | Ali Ghodsi | Keynote Data + AI Summit EU 2020](https://i.ytimg.com/vi/9oYosh-AoX0/default.jpg)
![Lakehouse with Delta Lake Deep Dive Training](https://i.ytimg.com/vi/znv4rM9wevc/default.jpg)
![Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake](https://i.ytimg.com/vi/jHLMGa8K9Ec/default.jpg)
![Delta Lake – A move to a Lake House](https://i.ytimg.com/vi/uK5Whs95g00/default.jpg)
![Azure data factory || Incremental Load or Delta load from SQL to File Storage](https://i.ytimg.com/vi/SUGT1YhlfYs/default.jpg)
![Databricks Pyspark: Merge (Upsert) using Pyspark and Spark SQL](https://i.ytimg.com/vi/i5oM2bUyH0o/default.jpg)
![Workshop Part 1 | Introduction to Python for Aspiring Data Scientists](https://i.ytimg.com/vi/HBVQAlv8MRQ/default.jpg)