Skip to content

AbhiYewale96/Student-Performance-Analysis

Repository files navigation

Student Performance Analysis

GitHub License Python SQL R Status

Author: Abhishek Yewale
Date: 25/02/2025
Project Status: Complete


πŸ“‹ Project Overview

A comprehensive data analysis project examining the academic performance of 200 students. This project combines statistical analysis, machine learning, and business intelligence to identify key performance drivers and provide actionable insights for educational improvement.

🎯 Objectives

  • Analyze relationships between study hours, attendance, sleep patterns, and exam performance
  • Identify high-performing and struggling students
  • Develop predictive models for exam score forecasting
  • Provide evidence-based recommendations for academic improvement
  • Create interactive dashboards for data visualization and monitoring

πŸ“Š Dataset Overview

Data Specifications

Attribute Value
Total Records 200 Students
Total Variables 6 Features
Data Completeness 100% (No Missing Values)
Time Period Single Academic Term
Data Quality Validated & Verified

Dataset Variables

Variable Type Range Unit Description
student_id String S001-S200 - Unique student identifier
hours_studied Numeric 1.0-12.0 hours/week Weekly study time for exam prep
sleep_hours Numeric 4.0-9.0 hours/night Average daily sleep duration
attendance_percent Numeric 50.0-100.0 % Percentage of classes attended
previous_scores Numeric 40-95 points Historical exam performance
exam_score Numeric 17.1-51.3 points Current exam score (Target)

πŸ—‚οΈ Project Structure

student-performance-analysis/
β”‚
β”œβ”€β”€ README.md                                    # This file
β”œβ”€β”€ data/
β”‚   └── student_exam_scores.csv                 # Raw dataset (200 records)
β”‚
β”œβ”€β”€ sql/
β”‚   └── Student_Performance_Analysis.sql        # Database queries and operations
β”‚
β”œβ”€β”€ python/
β”‚   └── student_performance_analysis.py         # Python ML analysis & predictions
β”‚
β”œβ”€β”€ r/
β”‚   └── Student_Performance_Analysis.R          # R visualization scripts
β”‚
β”œβ”€β”€ power-bi/
β”‚   └── Student_Performance_Analysis_Dashboard.pbix  # Interactive dashboard
β”‚
β”œβ”€β”€ reports/
    β”œβ”€β”€ Student_Performance_Analysis_Report.docx     # Complete project report
    β”œβ”€β”€ Executive_Summary_Visualizations.png         # 12-panel dashboard

---

## πŸ” Key Findings

### Performance Analysis

- **Mean Exam Score:** 33.57 (Range: 17.1 - 51.3)
- **Pass Rate:** 25.5% (51 out of 200 students)
- **Fail Rate:** 74.5% (149 out of 200 students)
- **Pass Threshold:** >38 points

### Study Habits

- **Average Study Hours:** 6.31 hours/week
- **High Performers (>8 hrs/week):** 29.5% (59 students)
- **At-Risk Students (<3 hrs/week):** 19.0% (38 students)

### Attendance Metrics

- **Average Attendance:** 75.18%
- **High Attendance (β‰₯75%):** 50.0% (100 students)
- **Low Attendance (<75%):** 50.0% (100 students)

### Correlation Analysis

| Factor | Correlation with Exam Score | Strength |
|--------|------------------------------|----------|
| Study Hours | 0.478 | **Strong Positive** |
| Previous Score | 0.271 | Moderate Positive |
| Attendance | 0.253 | Moderate Positive |
| Sleep Hours | -0.019 | Negligible |

### High Performers (Top Quartile)

- **Count:** 50 students (25%)
- **Average Study Hours:** 9.17 hours/week
- **Average Attendance:** 75.68%
- **Average Score:** >38.30

### Low Performers (Bottom Quartile)

- **Count:** 50 students (25%)
- **Average Study Hours:** 3.83 hours/week
- **Average Attendance:** 74.86%
- **Average Score:** <28.85

---

## πŸ› οΈ Technology Stack

### Data Processing & Analysis
- **Python 3.8+**
  - `pandas` - Data manipulation
  - `numpy` - Numerical computing
  - `scikit-learn` - Machine learning
  - `matplotlib` - Data visualization
  - `seaborn` - Statistical visualization

### Database
- **SQL** (MySQL/PostgreSQL)
  - Data extraction
  - Aggregation
  - Filtering
  - Window functions

### Statistical Visualization
- **R 4.0+**
  - `ggplot2` - Advanced graphics
  - `dplyr` - Data manipulation

### Business Intelligence
- **Power BI**
  - Interactive dashboards
  - Real-time monitoring
  - KPI tracking

---

## πŸ“ˆ Analysis Methods

### 1. Descriptive Statistics
- Measures of central tendency (Mean, Median, Mode)
- Measures of dispersion (Variance, Standard Deviation, IQR)
- Quartile analysis (25%, 50%, 75%)

### 2. Correlation Analysis
- Pearson correlation coefficient
- Correlation matrix
- Relationship strength interpretation

### 3. Predictive Modeling
- **Linear Regression** using Ordinary Least Squares (OLS)
- Train-test split: 80% training, 20% testing
- Model evaluation metrics:
  - R-squared (RΒ²)
  - Mean Squared Error (MSE)
  - Mean Absolute Error (MAE)
  - Root Mean Squared Error (RMSE)

### 4. Data Visualization
- Histograms (Score distribution)
- Box plots (Outlier detection)
- Scatter plots (Relationship analysis)
- Pie charts (Categorical breakdown)
- Line plots (Trend analysis)
- Correlation heatmaps

---

## πŸ’» Installation & Setup

### Prerequisites

```bash
# Python 3.8 or higher
python --version

# R 4.0 or higher
R --version

# SQL Server (MySQL/PostgreSQL)

Python Setup

# Clone the repository
git clone https://github.com/yourusername/student-performance-analysis.git
cd student-performance-analysis

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install required packages
pip install -r requirements.txt

Create requirements.txt

pandas>=1.3.0
numpy>=1.21.0
scikit-learn>=1.0.0
matplotlib>=3.4.0
seaborn>=0.11.0
jupyter>=1.0.0

R Setup

# Install required packages
install.packages("ggplot2")
install.packages("dplyr")
install.packages("tidyverse")

πŸš€ Usage

1. Python Analysis

# Run the analysis script
python python/student_performance_analysis.py

# Or use Jupyter notebook
jupyter notebook notebooks/analysis_notebook.ipynb

2. SQL Queries

# Connect to your database
mysql -u username -p database_name < sql/Student_Performance_Analysis.sql

3. R Visualization

# Run R script
Rscript r/Student_Performance_Analysis.R

# Or in RStudio
source("r/Student_Performance_Analysis.R")

4. Power BI Dashboard

  1. Open Power BI Desktop
  2. Load power-bi/Student_Performance_Analysis_Dashboard.pbix
  3. Connect to your data source
  4. Explore interactive visualizations

πŸ“Š Output Files

Reports Generated

File Format Description
Student_Performance_Analysis_Report.docx Word Complete 10-section professional report
Executive_Summary_Visualizations.png PNG 12-panel visualization dashboard (300 DPI)
Key_Metrics_Summary.txt Text Concise statistics and insights
Technical_Specifications.md Markdown Technical implementation details
PROJECT_COVER_PAGE.txt Text Project overview and statistics

πŸ“Œ Key Recommendations

For Educational Administrators

  • βœ… Implement study hour tracking systems
  • βœ… Strengthen attendance policies
  • βœ… Establish peer tutoring programs
  • βœ… Create early warning systems for at-risk students

For Academic Counselors

  • βœ… Conduct individual consultations with underperformers
  • βœ… Develop personalized study plans
  • βœ… Monitor student wellness and sleep patterns
  • βœ… Provide targeted interventions

For Students

  • βœ… Allocate minimum 8+ hours per week for exam preparation
  • βœ… Maintain 85%+ class attendance
  • βœ… Ensure 7-9 hours of sleep per night
  • βœ… Utilize available tutoring and study resources
  • βœ… Track personal progress and adjust strategies

πŸ“ˆ Model Performance

Linear Regression Model

Metric Value Interpretation
R-squared 0.32-0.45 32-45% of variance explained
RMSE 5.2-6.8 Average prediction error
MAE 4.1-5.3 Average absolute error
Training/Test Split 80/20 160 training, 40 test samples

πŸ” Data Security & Privacy

  • βœ… Student IDs anonymized (S001-S200)
  • βœ… No personally identifiable information (PII) retained
  • βœ… FERPA/GDPR compliant
  • βœ… Data encryption recommended for deployment
  • βœ… Access control and authentication required

πŸ“š Documentation

Comprehensive Documentation Included

  1. Technical Specifications - Complete implementation details
  2. SQL Query Documentation - Database operations explained
  3. Python Code Comments - Inline code documentation
  4. R Script Comments - Visualization code explained
  5. API Documentation - How to use functions and modules

Author

Abhishek Yewale
πŸ“§ Email: [abhishekyewale067@gmail.com]
πŸ”— LinkedIn: [www.linkedin.com/in/abhishek-yewale-70]
🌐 GitHub: @AbhiYewale96
πŸ“… Date: 25/02/2025


πŸ“Š Project Statistics

Metric Value
Total Lines of Code 500+
SQL Queries 15+
Python Functions 20+
R Visualizations 10+
Documentation Pages 50+
Analysis Hours 40+
Development Time 2 weeks

πŸ”„ Version History

Version 1.0 (25/02/2025)

  • βœ… Initial project completion
  • βœ… SQL database queries
  • βœ… Python ML analysis
  • βœ… R visualizations
  • βœ… Power BI dashboard
  • βœ… Comprehensive documentation
  • βœ… Professional reports

🚦 Status & Roadmap

Current Status: βœ… COMPLETE

Future Enhancements

  • Advanced ML models (Random Forest, XGBoost)
  • Real-time data pipeline
  • Web dashboard (Flask/Django)
  • API development (FastAPI)
  • Mobile app integration
  • Automated reporting system
  • Deep learning models
  • Cloud deployment (AWS/Azure)

πŸ“’ Acknowledgments

  • Dataset sourced from academic institution
  • Analysis performed using industry-standard tools
  • Visualization best practices from data science community
  • Statistical methods from peer-reviewed literature

⚠️ Disclaimer

This analysis is based on historical data and predictive models. Results should be interpreted in context with other factors affecting student performance. The recommendations are evidence-based but should be adapted to specific institutional contexts.


πŸ“Œ Getting Help

Troubleshooting

Issue: Python packages not installing

pip install --upgrade pip
pip install -r requirements.txt --no-cache-dir

Issue: SQL connection errors

# Verify database credentials
# Check database service is running
# Validate connection string

Issue: R package dependencies

install.packages(c("ggplot2", "dplyr", "tidyverse"))

πŸŽ‰ Conclusion

This Student Performance Analysis project demonstrates comprehensive data analysis capabilities combining statistical methods, machine learning, and business intelligence. The insights generated can drive meaningful improvements in educational outcomes.


Last Updated: 25/02/2025
Repository: GitHub
Status: Active & Maintained


πŸ“œ Additional Resources


Made with by Abhishek Yewale
Turning Data into Insights

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors