Lessons Learned From Running Apache Iceberg at Petabyte Scale // Subsurface 2020
Anton Okolnychyi, Apache Iceberg PMC Member and Apache Spark Contributor, presents "Lessons Learned From Running Apache Iceberg at Petabyte Scale" at Subsurface Summer 2020 -- the first-ever cloud data lake conference.
Apache Iceberg is an open table format that allows data engineers and data scientists to build efficient and reliable data lakes with features that are normally present only in data warehouses. Specifically, Iceberg enables ACID compliance on any object store or distributed system, boosts the performance of highly selective queries, provides reliable schema evolution, and offers time travel and rollback capabilities. Iceberg lets companies simplify their current architectures as well as unlock new use cases on top of data lakes.
This talk will describe how to maintain Iceberg tables in their optimal shapes while running at petabyte scale. In particular, the presentation will focus on how to efficiently perform metadata and data compaction on Iceberg tables with millions of files without any impact on concurrent readers and writers.
--------------------------------------------------------------------------------------------
Subsurface Is The Industry’s First Cloud Data Lake Conference
Presented by Dremio
Expand your technical knowledge and hear from your peers and industry experts about cloud data lake use cases and architectures at Subsurface™, where we explore what’s below the surface of the data lake. Hear firsthand from open source and technology leaders at companies about their experiences spearheading open source projects and building modern data lakes. Explore real-world use cases, from data warehousing and BI to data science and advanced analytics.
Connect with us!
Event Page https://bit.ly/33Ym5rh
Twitter https://bit.ly/2CqKhHt
Summer 2020 https://bit.ly/3iH160u
Dremio https://bit.ly/2XmtEnN
Видео Lessons Learned From Running Apache Iceberg at Petabyte Scale // Subsurface 2020 канала Dremio
Apache Iceberg is an open table format that allows data engineers and data scientists to build efficient and reliable data lakes with features that are normally present only in data warehouses. Specifically, Iceberg enables ACID compliance on any object store or distributed system, boosts the performance of highly selective queries, provides reliable schema evolution, and offers time travel and rollback capabilities. Iceberg lets companies simplify their current architectures as well as unlock new use cases on top of data lakes.
This talk will describe how to maintain Iceberg tables in their optimal shapes while running at petabyte scale. In particular, the presentation will focus on how to efficiently perform metadata and data compaction on Iceberg tables with millions of files without any impact on concurrent readers and writers.
--------------------------------------------------------------------------------------------
Subsurface Is The Industry’s First Cloud Data Lake Conference
Presented by Dremio
Expand your technical knowledge and hear from your peers and industry experts about cloud data lake use cases and architectures at Subsurface™, where we explore what’s below the surface of the data lake. Hear firsthand from open source and technology leaders at companies about their experiences spearheading open source projects and building modern data lakes. Explore real-world use cases, from data warehousing and BI to data science and advanced analytics.
Connect with us!
Event Page https://bit.ly/33Ym5rh
Twitter https://bit.ly/2CqKhHt
Summer 2020 https://bit.ly/3iH160u
Dremio https://bit.ly/2XmtEnN
Видео Lessons Learned From Running Apache Iceberg at Petabyte Scale // Subsurface 2020 канала Dremio
Показать
Комментарии отсутствуют
Информация о видео
Другие видео канала
![Apache Iceberg - A Table Format for Huge Analytic Datasets](https://i.ytimg.com/vi/mf8Hb0coI6o/default.jpg)
![Building efficient and reliable data lakes with Apache Iceberg](https://i.ytimg.com/vi/QNmSXMQ-gY4/default.jpg)
![Visualizing MongoDB and Pinot Data Using Trino](https://i.ytimg.com/vi/3OBAtUYjyz8/default.jpg)
![How I Would Learn Data Science (If I Had to Start Over)](https://i.ytimg.com/vi/4OZip0cgOho/default.jpg)
![Dynamically Generated Flink Jobs at Scale - Regina Chan, Goldman Sachs](https://i.ytimg.com/vi/_vJ1VqM4Kl0/default.jpg)
![5 Icebergs Flipping Over - incredible](https://i.ytimg.com/vi/hxy-0zpJwxs/default.jpg)
![Intro to Apache Pinot](https://i.ytimg.com/vi/T70jTTYhYyM/default.jpg)
![What is Hadoop?: SQL Comparison](https://i.ytimg.com/vi/MfF750YVDxM/default.jpg)
![Brian Olsen - Starburst- Trino on Ice: Using Iceberg To Replace the Hive Table Format](https://i.ytimg.com/vi/5-Q74rCX2Z8/default.jpg)
![Ornaments and Trills: Practice Methods for Classical Guitar](https://i.ytimg.com/vi/ggF_Bt2Rhjk/default.jpg)
![What is Dremio and Apache Arrow?](https://i.ytimg.com/vi/Xo9CO0a0VJI/default.jpg)
![Enterprise Data Lake: Architecture Using Big Data Technologies - Bhushan Satpute, Solution Architect](https://i.ytimg.com/vi/hsq4s_l9ZDM/default.jpg)
![Keynote: The Future of Intelligent Storage in Big Data // Subsurface Summer 2020](https://i.ytimg.com/vi/9uiaCN3tJyI/default.jpg)
![Introduction To Apache Cassandra](https://i.ytimg.com/vi/B_HTdrTgGNs/default.jpg)
![Why Chesapeake Energy Turned to Snowflake for Data Solutions | Snowflake Inc.](https://i.ytimg.com/vi/_9KxPWQ22q0/default.jpg)
![New Developments in the Open Source Ecosystem: Apache Spark 3 0, Delta Lake, and Koalas](https://i.ytimg.com/vi/scM_WQMhB3A/default.jpg)
![Apache Pulsar: The Next Generation Messaging and Queuing System](https://i.ytimg.com/vi/O2OXyA3YMMM/default.jpg)
![F8 2019: Getting Started with Presto Run SQL at Any Scale](https://i.ytimg.com/vi/67gXN5697Vw/default.jpg)
![The Future of Data Infrastructure - 3 Key Trends](https://i.ytimg.com/vi/a7VFZSnDd7A/default.jpg)
![Spark and Iceberg at Apple's Scale - Leveraging differential files for efficient upserts and deletes](https://i.ytimg.com/vi/IzkSGKoUxcQ/default.jpg)