Malware Classifier: ML-Powered PE File Detection | Quantic MSSE Intro to machine learning Project

Malware Classifier using Machine Learning
Quantic School of Business and Technology - MSSE Program

This video demonstrates a production-ready malware classification system that analyzes PE (Portable Executable) files using machine learning. The project was completed in partial fulfillment of the Introduction to Machine Learning course at Quantic School of Business and Technology.

Live Application:
https://tetteh-apotey-malware-classifier.hf.space

GitHub Repository:
https://github.com/life2allsofts/malware-classifier
(Private - quantic-grader added as collaborator)

PROJECT OVERVIEW

The application uses an XGBoost model with 17 PE header features to classify executable files as malware or benign software. Key features include:

98.03% accuracy on test set

99.59% AUC-ROC for excellent discrimination

17 PE header features (no data leakage)

Bias correction with 0.6 threshold

Fully automated CI/CD pipeline (51 successful runs)

APPLICATION FEATURES

File Upload Analysis

Upload .exe, .dll, .sys, .ocx, .scr, .cpl files

Extracts SHA-256 hash and entropy

Real-time prediction with confidence scores

Manual Input

Enter all 17 PE features manually
Sample templates for testing
Understand how features influence predictions

Batch Processing
CSV upload for multiple files
Download predictions.csv with results
Ideal for bulk analysis

Model Information
Feature importance visualization
Confusion matrix and performance metrics
Complete transparency

CI/CD PIPELINE

The project includes a fully automated GitHub Actions pipeline that:
Runs tests on every push (16 tests in 46 seconds)
Checks for prior bias and model sanity
Auto-deploys to Hugging Face Spaces on success
Performs smoke tests to verify deployment
Total workflow runs: 51 | Latest status: Passing

MODEL PERFORMANCE
Metric Value
Accuracy 98.03%
Precision 98.24%
Recall 98.37%
F1-Score 98.30%
AUC-ROC 99.59%

Confusion Matrix (Test Set):

Predicted
BENIGN MALWARE
Actual BENIGN 1648 42
Actual MALWARE 38 2287
False Positives: 42
False Negatives: 38
Total Errors: 80 (1.99% error rate)

TECHNOLOGIES USED
Machine Learning: XGBoost, scikit-learn, pandas, numpy
Web Framework: Flask, Jinja2 templates
Deployment: Hugging Face Spaces, Docker

CI/CD: GitHub Actions

AI Tools: DeepSeek AI (97%), ChatGPT (2%), GitHub Copilot (1%)

DOCUMENTATION
All project documentation is available in the GitHub repository:
Evaluation and Design:
https://github.com/life2allsofts/malware-classifier/blob/main/docs/evaluation-and-design.md

AI Tooling Strategy:
https://github.com/life2allsofts/malware-classifier/blob/main/docs/ai-tooling.md

Deployment Information:
https://github.com/life2allsofts/malware-classifier/blob/main/docs/deployed.md

Results and Metrics:
https://github.com/life2allsofts/malware-classifier/blob/main/results/README.md

ABOUT THE DEVELOPER
Isaac Tetteh-Apotey
MSSE Candidate, Quantic School of Business and Technology
Geomatics Engineer & Software Engineering Researcher
GitHub: https://github.com/life2allsofts
Portfolio: https://tetteh-apotey.vercel.app/
LinkedIn: https://www.linkedin.com/in/isaac-tetteh-apotey-67408b89/

PROJECT TIMELINE
Started: February 17, 2026
Completed: February 28, 2026
Development Time: 11 days
CI/CD Runs: 51 successful workflows

DISCLAIMER
This application is intended for educational and research purposes only. The model should not be used as the sole determinant for malware classification in production environments without additional validation.

For questions about this project, please reach out via GitHub or LinkedIn.

Видео Malware Classifier: ML-Powered PE File Detection | Quantic MSSE Intro to machine learning Project канала WisdomWord GH