Авто	Видео-блоги	ДТП, аварии	Для маленьких	Еда, напитки
Животные	Закон и право	Знаменитости	Игры	Искусство
Комедии	Красота, мода	Кулинария, рецепты	Люди	Мото
Музыка	Мультфильмы	Наука, технологии	Новости	Образование
Политика	Праздники	Приколы	Природа	Происшествия
Путешествия	Развлечения	Ржач	Семья	Сериалы
Спорт	Стиль жизни	ТВ передачи	Танцы	Технологии
Товары	Ужасы	Фильмы	Шоу-бизнес	Юмор

Building Iceberg native applications in simple Python (Eventual)

Apache Iceberg in 2024 is now more usable than ever and users no longer need to take a hard dependency on heavyweight query engines such as Apache Spark to work with Apache Iceberg. This talk will showcase examples of use-cases where oftentimes running simple Python scripts on your laptop will suffice, but also provide best-practices for understanding when distributed query engines are still useful when working with Apache Iceberg. We do this by sharing our learnings in the process of building a non-JVM distributed query engine on top of Apache Iceberg, and in so doing provide some (perhaps surprising) use-cases where Iceberg is now extremely viable without needing heavyweight frameworks such as Spark: - Distributed data ingestion into Iceberg tables - Incremental data processing on partitioned data - Aggregations on terabyte-scale Iceberg tables from your laptop by leveraging table-level (Iceberg) and file-level (Parquet) metadata - Table and catalog management This talk will use query plans from the Daft distributed query engine to show how query engines do this all for you under the hood, but will also provide short Python code snippets to show how simple this can actually be if you want to implement much of this functionality yourself without a query engine

Видео Building Iceberg native applications in simple Python (Eventual) канала Apache Iceberg

ApacheIceberg DataEngineering ApacheSoftwareFoundation

Комментарии отсутствуют