A machine learning project that predicts whether a student will pass or fail based on personal, social, and academic factors. Built as a capstone for the Fundamentals of AI and ML course.
In many educational institutions, students who are at risk of failing are identified only after it is too late to intervene. This project builds an early-warning classifier using supervised learning that can flag at-risk students based on observable characteristics, enabling teachers and counsellors to provide timely support.
| Concept | Application |
|---|---|
| Supervised Learning | Binary classification (Pass / Fail) |
| Decision Tree Classifier | Main model — interpretable and visual |
| Train/Test Split | 80/20 stratified split |
| Model Evaluation | Accuracy, Precision, Recall, F1-Score, Confusion Matrix |
| Feature Importance | Identifying the strongest predictors of student outcomes |
student-performance-predictor/
│
├── student_performance_predictor.py # Main ML script
├── student-mat.csv # Dataset
├── results.png # Output: confusion matrix + feature importance chart
├── README.md # This file
└── report/
└── Project_Report.pdf # Full project report
Source: UCI Machine Learning Repository — Student Performance Dataset
- File needed:
student-mat.csv(Mathematics course data) - Students: 395
- Features: 33 attributes covering demographics, family background, study habits, and grades
- Target: Final grade
G3→ converted toPass (≥10)orFail (<10)
git clone https://github.com/<your-username>/student-performance-predictor.git
cd student-performance-predictorpip install pandas numpy scikit-learn matplotlib seabornPython 3.8 or higher is recommended.
Download student-mat.csv from the UCI link above and place it in the project root folder.
python student_performance_predictor.pyAfter running, you will see:
- Accuracy score and full classification report in the terminal
- A saved image
results.pngcontaining:- Confusion Matrix (actual vs predicted outcomes)
- Feature Importance bar chart (which factors matter most)
- Human-readable decision rules extracted from the tree
- A demo prediction for a sample student profile
Example terminal output:
Accuracy : 91.14%
Classification Report:
precision recall f1-score support
Fail 0.79 0.73 0.76 15
Pass 0.93 0.95 0.94 64
accuracy 0.91 79
- G2 (second-period grade) is the strongest predictor of final outcome — prior performance is highly predictive.
- Number of past failures is the second most important feature.
- Students who aspire to higher education (
higher = yes) significantly outperform those who do not. - Lifestyle factors like
gooutandWalc(weekend alcohol consumption) have a measurable negative impact.
| Package | Version |
|---|---|
| Python | ≥ 3.8 |
| pandas | ≥ 1.3 |
| numpy | ≥ 1.21 |
| scikit-learn | ≥ 1.0 |
| matplotlib | ≥ 3.4 |
| seaborn | ≥ 0.11 |
Install all at once:
pip install pandas numpy scikit-learn matplotlib seaborn- Add a web interface using Streamlit so teachers can input student details and get a prediction
- Try Random Forest or Gradient Boosting for better accuracy
- Incorporate the Portuguese language course dataset (
student-por.csv) for broader generalisation - Use SHAP values for more detailed model explainability
This project is submitted as part of an academic course evaluation. Dataset credit: P. Cortez and A. Silva, University of Minho, Portugal.
Shivam Rathore
Course: Fundamentals of AI and ML