A full-stack machine learning application for diabetes risk assessment, featuring custom ML algorithms implemented from scratch alongside industry-standard libraries. The system provides dual interfaces for healthcare professionals and general users.
📖 English Version | 🇻🇳 Phiên Bản Tiếng Việt
- 4 ML algorithms implemented from scratch (no sklearn for core logic):
- Decision Tree with Gini/Entropy, pruning, and visualization
- Support Vector Machine using CVXOPT quadratic programming
- LightGBM with histogram-based gradient boosting
- XGBoost gradient boosting (scratch implementation)
- Side-by-side comparison with sklearn/library implementations
- 16 trained models covering all algorithm × implementation × user-type combinations
- RESTful API with Flask backend
- Modern React 19 frontend with Tailwind CSS
- Dual user interfaces: Doctor (33 features) vs Patient (23 features)
- AI-powered health advice via Grok API integration
- Data visualization with Recharts and decision tree plots
| Model | Accuracy | Precision | Recall | F1 Score | Training Time |
|---|---|---|---|---|---|
| XGBoost (Library) | 96.8% | 100% | 92.0% | 95.8% | 0.30s |
| XGBoost (Scratch) | 97.3% | 100% | 93.3% | 96.5% | 101s |
| Decision Tree (sklearn) | 95.2% | 98.5% | 89.3% | 93.7% | 0.03s |
| Decision Tree (Scratch) | 95.7% | 98.5% | 90.6% | 94.4% | 3.8s |
| LightGBM | 96.8% | 100% | 92.0% | 95.8% | 0.0004s |
| SVM (sklearn) | 86.7% | 87.9% | 77.3% | 82.2% | 0.10s |
| SVM (Scratch/CVXOPT) | 88.3% | 88.4% | 81.3% | 84.7% | 1.45s |
diabetes-prediction/
├── backend/
│ ├── ml_algorithms/ # Refactored ML implementations (documented)
│ │ ├── decision_tree.py # Decision Tree with full documentation
│ │ ├── svm.py # SVM using CVXOPT
│ │ └── lightgbm.py # Histogram-based gradient boosting
│ ├── DecisionTree.py # Original Decision Tree implementation
│ ├── SVM_fromScratch.py # Original SVM implementation
│ ├── LightGBM_fromScratch.py # Original LightGBM implementation
│ ├── models/ # 16 pre-trained .pkl models
│ ├── data/ # Dataset
│ ├── app.py # Main Flask API (port 5000)
│ ├── advice.py # AI health advice service (port 5001)
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── pages/ # React pages
│ │ │ ├── PredictPatient.js
│ │ │ ├── PredictDoctor.js
│ │ │ ├── Comparison.js # Model comparison dashboard
│ │ │ └── ...
│ │ ├── components/ # Reusable UI components
│ │ └── utils/ # Helper functions
│ └── package.json
├── docs/
│ ├── ARCHITECTURE.md # System architecture documentation
│ └── ML_ALGORITHMS.md # Detailed ML algorithm explanations
└── README.md
- Supports Gini impurity and Entropy criteria
- Implements pre-pruning: max_depth, min_samples_split, min_samples_leaf
- Best-first leaf splitting with max_leaf_nodes constraint
- Class weighting for imbalanced datasets
- Custom tree visualization using matplotlib
# Example usage
from DecisionTree import DecisionTreeClassifierScratch
model = DecisionTreeClassifierScratch(
criterion="gini",
max_depth=10,
min_samples_split=5,
class_weight="balanced"
)
model.fit(X_train, y_train)
model.plot_tree(class_names=['Normal', 'Diabetic'])- Solves the dual optimization problem using quadratic programming
- Supports RBF, Linear, and Polynomial kernels
- Implements soft-margin SVM with C parameter
- Extracts support vectors for interpretability
- Histogram-based gradient boosting for efficiency
- Leaf-wise tree growth strategy
- Binary classification with log loss optimization
- Custom binning for continuous features
- Python 3.9+
- Node.js 18+
cd backend
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txt
python app.py # Runs on port 5000
python advice.py # AI advice service on port 5001cd frontend
npm install
npm start # Runs on port 3000- Simple health questionnaire
- Easy-to-understand risk assessment
- Personalized AI health advice
- Historical prediction tracking
- Clinical measurements (Blood pressure, HbA1c, Cholesterol, etc.)
- Detailed model comparison
- Decision tree visualization
- Professional diagnostic support
Backend:
- Flask + Flask-CORS
- NumPy, Pandas, Scikit-learn
- CVXOPT (for SVM optimization)
- XGBoost, LightGBM
- OpenAI SDK (Grok API)
Frontend:
- React 19 + React Router
- Tailwind CSS + Framer Motion
- Recharts (data visualization)
- React Image Gallery
- Deep ML Understanding: Implementing algorithms from scratch shows mastery of underlying mathematics
- Software Architecture: Clean separation of concerns, RESTful API design
- Full-Stack Development: End-to-end application from ML models to user interface
- User-Centric Design: Dual interfaces for different user expertise levels
- AI Integration: Modern LLM API integration for enhanced user experience
MIT License - feel free to use this project for learning and reference.
[Tran Le Minh Nhat - Minhmarks]
⭐ If you find this project helpful, please give it a star!