========================================================
L M S | Predictive Analytics & Insight Dashboard
========================================================
Machine learning pipeline and lightweight analytics dashboard for Learning Management System (LMS) data. Includes feature engineering, model training (XGBoost + Random Forest baseline), evaluation artifacts, and a credential‑protected Streamlit interface.
- Reproducible pipeline: collection → preprocessing → feature engineering → training → evaluation.
- Models: XGBoost (GPU aware) and Random Forest baseline.
- Feature engineering: statistical aggregates, ratios, temporal usage, optional PCA + ANOVA F‑test selection.
- Evaluation artifacts: confusion matrices, prediction distribution, feature importance.
- Dashboard: Streamlit with encrypted credential store (Fernet) for simple access control.
- Modular scripts enable selective re‑runs using cached intermediate CSV outputs.
Dataset/ Raw LMS CSV inputs
data_collection.py Merge & normalize source tables
data_preprocessing.py Cleaning, encoding, scaling, train/test split
feature_engineering.py Derived features (counts, temporal, ratios, PCA/ANOVA)
model_training.py Train & persist XGBoost / RandomForest models
model_evaluation.py Metrics + plots (confusion matrices, distributions)
streamlit_app.py Authenticated analytics dashboard
insight_generation.py Higher-level narrative insights (optional use)
generate_credentials.py Create encrypted credentials store
credentials_encrypted.pkl Encrypted user/password data
key.key Symmetric encryption key (keep secret!)
plots/ Generated visualizations
model.xgb Saved primary model artifact
requirements.txt Dependency lock snapshot
README.md Project documentation
Suggested future refinement (not applied):
src/
data/
features/
models/
api/
dashboard/
tests/
This repository snapshot demonstrates:
- Data merging and preprocessing pipeline design.
- Feature engineering strategy for educational performance prediction.
- Comparative modeling (gradient boosting vs ensemble baseline).
- Artifact-driven evaluation (plots committed for review). Operational run instructions have been intentionally omitted to keep focus on structure and deliverables.
Core model evaluation artifacts (committed for reference):
| Overall | XGBoost | Random Forest |
|---|---|---|
![]() |
![]() |
![]() |
| Feature Importance | Prediction Distribution |
|---|---|
![]() |
![]() |
Interpretation (abridged): off‑diagonal confusion matrix cells indicate systematic errors; feature importance highlights leverage points; distribution plot surfaces class imbalance considerations.
Primary metrics: accuracy, precision/recall, confusion matrices, gain/impurity feature importance. Potential extensions (not implemented here): cross‑validation, hyperparameter optimization, SHAP explanations, calibration curves.
XGBoost configuration supports automatic GPU usage when an appropriate build is installed; otherwise falls back to CPU.
Dashboard authentication uses an encrypted (Fernet) credential store suitable for lightweight demo access control.
Minimal container assets illustrate packaging (not an active deployment pipeline):
Dockerfile (excerpt): Python 3.11 slim base, installs requirements, launches Streamlit on port 8501.
Build & run (optional):
docker build -t lms .
docker run --rm -p 8501:8501 lms
Or with compose:
docker compose up --build
Result: dashboard accessible at http://localhost:8501 .




