Загрузка...

Open-Source Spotlight - DuckDB - Gabor Szarnyas

DuckDB: a high-performance analytical database system. It is designed to be fast, reliable, portable, and easy to use.

Timecodes:

00:00 Introduction
00:36 Overview of the input CSV files
01:10 Loading with Pandas
01:48 Loading with Spark
02:32 Loading with DuckDB
04:40 Simple SQL queries in DuckDB
05:24 Friendly syntax for aggregation in DuckDB
07:13 Gradual integration with Pandas
08:31 Exporting data to Parquet
08:58 DuckDB CLI client
09:45 Loading Parquet to DuckDB
10:32 Handling nested data sets
12:05 Excluding columns in the SELECT clause
13:30 Running DuckDB in the browser via Wasm
15:40 GitHub repository
16:15 Installation options
17:16 Persistent storage
18:07 More installation options
18:50 Running shell.duckdb.org on a phone
19:25 How DuckDB is so fast?
20:47 Differences between DuckDB vs. pandas, compiler optimizations
22:02 Contributing to the project
24:24 Creating extensions to DuckDB
25:30 Advice for our listeners
26:07 Wrapping upa high-performance analytical database system. It is designed to be fast, reliable, portable, and easy to use.

Links:

- DuckDB installation page:https://duckdb.org/docs/installation
- DuckDB's CSV Sniffer: Automatic Detection of Types and Dialects – https://duckdb.org/2023/10/27/csv-sniffer.html

Free ML Engineering course: http://mlzoomcamp.com

Join DataTalks.Club: https://datatalks.club/slack.html
Our events: https://datatalks.club/events.html

Видео Open-Source Spotlight - DuckDB - Gabor Szarnyas канала DataTalksClub ⬛
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять