Загрузка...

Project Name : Implement Resume Screening using Machine Learning

Project Name : Implement Resume Screening using Machine Learning

Our project demonstrates two approaches to automate resume screening using NLP and machine learning. Both notebooks classify resumes as either:

"Fit – Move forward with interview"
"Not a good fit"
The goal of our project was to use natural language processing and data analytics (Z-score analysis) to automatically screen resumes and identify top candidates from a large dataset.

Part 1 of project: Setup
1. Data Cleaning
Original dataset is from Kaggle
Strip HTML tags, punctuation, numbers
Standardize alll text styles
Remove stopwords using NLTK

2. Keyword Scoring
Uses a list of relevant data science keywords from keywords.csv
For each resume, counts how many keywords are present (KeywordScore)

Part 2 of project: Model building and training
Method 1: Rule-Based Threshold
Any resume with 3 or more keyword matches is labeled as "Fit".
Otherwise labeled as "Not a good fit".
Simple and easy to tune.

Model
Text vectorized using CountVectorizer
Logistic Regression is trained to classify based on cleaned text
We picked a threshold of 3 keywords in the rule-based method as a simple heuristic to identify candidates with basic relevant skills, though this number can be easily customized based on job requirements.
Method 2: Z-Score Based Labeling
Computes the Z-score of each resume’s KeywordScore

A resume is labeled "Fit" if its z-score is ≥ 0.5

This method adapts based on the distribution of scores across the dataset

Histogram of Z-scores is plotted to visualize threshold impact

The Z-score was computed by taking each resume's keyword match count, subtracting the average keyword count across all resumes, and dividing by the standard deviation.

Mathematically: Z = (KeywordScore - MeanKeywordScore) / StandardDeviation
This tells us how far a resume’s keyword count is from the average, in units of standard deviation, helping us identify resumes that stand out.
We chose a Z-score threshold of 0.5 to select resumes that have keyword scores at least half a standard deviation above the mean, indicating above-average relevance without being overly restrictive.

Output
Each notebook:

Trains and evaluates a logistic regression model
Prints classification metrics (precision, recall, f1-score)
Displays each resume’s final decision

Видео Project Name : Implement Resume Screening using Machine Learning канала Ignito
Страницу в закладки Мои закладки
Все заметки Новая заметка Страницу в заметки

На информационно-развлекательном портале SALDA.WS применяются cookie-файлы. Нажимая кнопку Принять, вы подтверждаете свое согласие на их использование.

Об использовании CookiesПринять