Загрузка...

Invoice QC Service | PDF Extraction + Validation Pipeline | Python, FastAPI, CLI Demo

📌 Project Overview

This video is a walkthrough of my project Invoice QC Service, built as part of the Software Engineer Intern – Data & Development assignment. The system performs automatic PDF invoice extraction, schema validation, and quality control checks, and exposes the logic through both a CLI tool and a FastAPI backend.

🔧 Key Features
1️⃣ PDF → JSON Extraction Module

Extracts invoice fields (invoice number, dates, parties, totals, etc.)

Uses regex + pattern matching to process real B2B invoices

Supports optional line item extraction

2️⃣ Validation Engine

Completeness checks (missing fields, invalid dates, empty seller/buyer info)

Format rules (date parsing, currency validation)

Business rules (net + tax = gross, due date ≥ invoice date)

Duplicate detection & anomaly checks

3️⃣ Command-Line Interface (CLI)

Supports:

extract – Convert PDFs into structured JSON

validate – Run validation on extracted data

full-run – End-to-end extraction + validation

Generates detailed reports and summaries

4️⃣ FastAPI Backend (HTTP API)

Includes:

POST /validate-json – Validate invoice JSON payload

GET /health – Health check endpoint

(Optional) POST /extract-and-validate-pdfs

🖥 Tech Stack

Python 3.10+

FastAPI for backend APIs

pdfplumber / PyPDF2 for PDF extraction

argparse / Typer for CLI tools

Pydantic for data models

JSON reports for validation results

🧩 Architecture
PDFs → Extraction Module → JSON → Validation Engine → CLI / API / Optional UI

📁 Repository

GitHub Repo: https://github.com/Mysteriousboy727/invoice-extraction-qc-system.git

🎥 What’s in This Video?

Project overview

Explanation of schema and validation rules

Code walkthrough (extractor, validator, CLI, API)

Running CLI with sample PDFs

Demo of FastAPI endpoints in action

🧠 Why This Project Matters

This system demonstrates real-world skills in:

Data extraction

Backend development

Validation pipelines

API design

CLI engineering

Clean, modular Python architecture

Видео Invoice QC Service | PDF Extraction + Validation Pipeline | Python, FastAPI, CLI Demo канала Romeo
Яндекс.Метрика
Все заметки Новая заметка Страницу в заметки
Страницу в закладки Мои закладки
На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.
О CookiesНапомнить позжеПринять