Загрузка...

Day 3 - Parsing unstructured PDF's into pandas dataframe - Interview preperation

My name is Divyaprakash and I'm a Data Scientist with 1.5 years of experience. As part of my job hunt, I'm taking part in daily data science challenge, where I'm going to document the day to day works I do as part of my upskilling journey.

Day 3: Text preprocessing - Parsing unstructured PDF's into pandas dataframe

🔍 PDF Bank Statement to Structured Data using Python

In this video, we dive into extracting transactions from a bank statement PDF and converting it into a structured pandas DataFrame using Python! We use pdfplumber for reading PDFs and regex for parsing transaction details.

📌 What you’ll learn:

✅ How to use pdfplumber to extract text from PDFs

✅ Regular expressions to detect dates and amounts

✅ Logic to split transaction blocks into structured rows

✅ Build a transaction dataset directly from unstructured bank statements

Google collab link : https://colab.research.google.com/drive/1KFATv7J3PrweQE4eBzC4hxNWwCoxZVDA?usp=sharing
Linkedin : https://www.linkedin.com/in/divyaprakash-rathinasabapathy/
Github : https://github.com/rdivyaprakash78

Видео Day 3 - Parsing unstructured PDF's into pandas dataframe - Interview preperation канала Divyaprakash R
Страницу в закладки Мои закладки
Все заметки Новая заметка Страницу в заметки