Загрузка...

Real Azure Data Factory Project | End-to-End CSV to SQL Pipeline

📌 This is a production-style Azure Data Factory project — not a toy example.

In this video, I build an end-to-end, incremental data ingestion pipeline using
Azure Data Factory, Azure Blob Storage, and Azure SQL Database.

Instead of loading a single file, we design a dynamic pipeline that scans a Blob
Storage container and automatically ingests all incoming CSV files into staging
tables, then applies business logic using SQL stored procedures.

The focus is on real data engineering patterns: staging layers, idempotent loads,
referential integrity validation, and explicit error handling.

────────────────────────────
🔧 PIPELINE OVERVIEW
────────────────────────────
• Dynamic file discovery using Get Metadata
• ForEach loop to process all files in the container
• Copy Activity to load raw data into staging tables
• SQL Stored Procedures to:
– Deduplicate dimension tables
– Validate foreign keys
– Insert valid records
– Quarantine invalid records into an error table

────────────────────────────
🗃️ DATA MODEL
────────────────────────────
Staging tables (raw ingestion):
• stg_customers
• stg_products
• stg_orders

Final curated tables:
• customers
• products
• orders

Error handling:
• orders_error (invalid records + reason)

────────────────────────────
⚙️ KEY DESIGN PRINCIPLES
────────────────────────────
• Separation of concerns:
Azure Data Factory for orchestration,
SQL for business logic

• Incremental & idempotent processing:
Safe to re-run without duplicates

• Explicit error handling:
Invalid data is never silently dropped

• Production-style design:
Clear, explainable, and interview-ready

────────────────────────────
⏱️ VIDEO CHAPTERS
────────────────────────────
00:00 – Project overview & goals
02:30 – Architecture & data model
05:10 – Azure Blob Storage setup
07:40 – SQL staging & final tables
12:00 – Get Metadata & file discovery
16:30 – ForEach & Copy activity logic
25:10 – Stored procedures (deduplication & FK validation)
36:40 – End-to-end pipeline execution
44:00 – Final validation & conclusions

────────────────────────────
🔗 GITHUB REPOSITORY
────────────────────────────
Full project, SQL schema, and README:
https://github.com/MasouData/adf-data-pipeline-project.git

────────────────────────────
ℹ️ NOTES
────────────────────────────
This project intentionally focuses on correctness, clarity, and
production-style design rather than advanced optimizations such as
CDC, watermarking, or streaming ingestion.

────────────────────────────
🏷️ TAGS
────────────────────────────
#AzureDataFactory #AzureSQL #DataEngineering #ETL #ADF #SQL #Azure

Видео Real Azure Data Factory Project | End-to-End CSV to SQL Pipeline канала MasouData
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять