Загрузка...

I Built a dbt Pipeline on Real Steel Emissions Data — Here's Every Decision

In this video: My first full dbt project — built on Climate TRACE satellite-verified steel emissions data. The core problem: weighted emissions attribution across multi-owner industrial assets, following GHG Protocol methodology. Full walkthrough of architecture, every model, testing strategy, and production improvements.

What's covered:
- Climate TRACE dataset: satellite-verified vs. self-reported emissions.
- Why Iron & Steel: 7–9% of global GHG, joint venture ownership complexity.
- Staging → Core → Marts: materialisation decisions at each layer
- Custom generate_schema_name macro — overriding dbt's schema resolution
- COALESCE(ownership_percent, 100) — GHG Protocol Scope 1 attribution
- LEFT JOIN vs INNER JOIN in the fact model — why data loss matters
- Explicit column selection — why SELECT * breaks downstream models
- dim_steel_giants: dual rankings for corporate accountability
- country_emissions_benchmarking: variance from global average
- unique + relationship tests — referential integrity before the join
- severity: warn vs error — when to flag vs when to stop
- What I'd build next: incremental models, source freshness, full documentation

Dataset: Climate TRACE — https://climatetrace.org/data
GitHub: https://github.com/likhit-m/data_warehousing_practice_repo.git

Видео I Built a dbt Pipeline on Real Steel Emissions Data — Here's Every Decision канала Built by Likhit
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять