Building Iceberg native applications in simple Python (Eventual)
Apache Iceberg in 2024 is now more usable than ever and users no longer need to take a hard dependency on heavyweight query engines such as Apache Spark to work with Apache Iceberg. This talk will showcase examples of use-cases where oftentimes running simple Python scripts on your laptop will suffice, but also provide best-practices for understanding when distributed query engines are still useful when working with Apache Iceberg. We do this by sharing our learnings in the process of building a non-JVM distributed query engine on top of Apache Iceberg, and in so doing provide some (perhaps surprising) use-cases where Iceberg is now extremely viable without needing heavyweight frameworks such as Spark: - Distributed data ingestion into Iceberg tables - Incremental data processing on partitioned data - Aggregations on terabyte-scale Iceberg tables from your laptop by leveraging table-level (Iceberg) and file-level (Parquet) metadata - Table and catalog management This talk will use query plans from the Daft distributed query engine to show how query engines do this all for you under the hood, but will also provide short Python code snippets to show how simple this can actually be if you want to implement much of this functionality yourself without a query engine
Видео Building Iceberg native applications in simple Python (Eventual) канала Apache Iceberg
Видео Building Iceberg native applications in simple Python (Eventual) канала Apache Iceberg
Комментарии отсутствуют
Информация о видео
30 мая 2024 г. 5:19:58
00:41:03
Другие видео канала



















