A comprehensive machine learning project for healthcare data analysis featuring breast cancer classification and diabetes risk prediction models with an interactive web dashboard.
Try the Live Application:
- Streamlit App: Healthcare ML Dashboard
- Repository: GitHub Repository
This repository contains a complete healthcare machine learning solution with two main components:
- Jupyter Notebook Analysis - Detailed exploratory data analysis and model development
- Streamlit Web Application - Interactive dashboard for real-time predictions
healthcare-ml-dashboard/
├── Healthcare_ML_Models_Annotated_and_Visualized.ipynb # Main analysis notebook
├── Healthcare_ML_Models_Organized.ipynb # Clean organized code
├── healthcare_app.py # Streamlit web application
├── requirements.txt # Python dependencies
├── resources/ # Images and assets
│ ├── README.md # Image guidelines
│ ├── header_image.jpg # Dashboard header image
│ ├── breast_cancer_model.jpg # Breast cancer section image
│ ├── diabetes_model.jpg # Diabetes section image
│ └── data_exploration.jpg # Data exploration image
├── .gitignore # Git ignore file
└── README.md # This file
- Python 3.7 or higher
- pip package manager
-
Clone or Download the Repository
git clone https://github.com/Uday2kranth/Health-care-ML-Final-project.git cd Health-care-ML-Final-project -
Install Dependencies
pip install -r requirements.txt
-
Run the Streamlit App
streamlit run healthcare_app.py
-
Open in Browser
- The app will automatically open at
http://localhost:8501 - If not, manually navigate to the URL shown in the terminal
- The app will automatically open at
- Fork this repository
- Sign up at Streamlit Cloud
- Connect your GitHub repository
- Deploy with one click
- Live app: Healthcare ML Dashboard
- Railway: Simple deployment with GitHub integration
- Render: Free tier available for small projects
- Google Cloud Platform: Professional deployment option
- AWS: Enterprise-level deployment solution
This comprehensive Jupyter notebook provides a step-by-step analysis of healthcare datasets:
- Breast Cancer Dataset: 569 samples, 30 features
- Diabetes Dataset: 442 samples, 10 features
- Initial data inspection and basic statistics
- Statistical Summary: Mean, median, standard deviation for all features
- Class Distribution: Visualization of target variable distributions
- Feature Correlations: Correlation heatmaps to identify relationships
- Data Visualization:
- Histograms for feature distributions
- Box plots for outlier detection
- Scatter plots for feature relationships
- Missing Value Analysis: Check for null values
- Feature Selection: Identify most important features
- Data Scaling: Normalize features for better model performance
- Train-Test Split: Prepare data for model training
- Algorithm: XGBoost Classifier
- Features Used:
- Worst radius
- Worst perimeter
- Mean concave points
- Worst concave points
- Worst area
- Performance Metrics:
- Accuracy: ~95%
- Precision, Recall, F1-Score
- Confusion Matrix
- Algorithm: XGBoost Regressor
- Features Used: All 10 diabetes features
- Performance Metrics:
- R² Score: ~0.5
- Mean Squared Error
- Mean Absolute Error
- Feature Importance Plots: Bar charts showing feature contributions
- Confusion Matrix: Visual representation of classification results
- Prediction vs Actual: Scatter plots for regression analysis
- ROC Curves: Model performance evaluation
- Cross-validation: Ensure model robustness
- Performance Analysis: Detailed metrics interpretation
- Model Comparison: Baseline vs optimized models
The interactive web application provides a user-friendly interface for healthcare predictions:
# Main components
├── load_and_prepare_data() # Data loading and caching
├── train_breast_cancer_model() # Model training with caching
├── train_diabetes_model() # Model training with caching
├── load_image_from_resources() # Image loading functionality
└── main() # Main application logic- Model Selection: Choose between three main sections
- Project Information: Overview of technologies used
- Responsive Design: Clean, professional interface
- Input Interface: Interactive sliders for patient measurements
- Real-time Prediction: Instant classification results
- Confidence Scores: Probability of benign vs malignant
- Visual Analytics: Feature importance charts
- Health Metrics Input: Patient health parameter sliders
- Risk Assessment: Continuous risk score calculation
- Performance Visualization: Actual vs predicted scatter plots
- Risk Interpretation: Low, moderate, high risk categories
- Dataset Overview: Summary statistics and distributions
- Interactive Charts: Pie charts, histograms, correlation heatmaps
- Comparative Analysis: Side-by-side dataset comparison
- Caching:
@st.cache_dataand@st.cache_resourcefor performance - Responsive Layout: Multi-column layouts and tabs
- Error Handling: Graceful error management
- Image Support: Dynamic image loading from resources folder
- Modern Design: Gradient backgrounds and professional styling
- Interactive Elements: Sliders, buttons, and dropdowns
- Visual Feedback: Color-coded prediction results
- Responsive Layout: Works on different screen sizes
# After cloning/downloading
git clone https://github.com/Uday2kranth/Health-care-ML-Final-project.git
cd Health-care-ML-Final-project
pip install -r requirements.txt# Start Jupyter
jupyter notebook
# Open in browser
# Navigate to Healthcare_ML_Models_Annotated_and_Visualized.ipynb
# Run cells sequentially from top to bottom# Start the web application
streamlit run healthcare_app.py
# Access in browser
# Navigate to http://localhost:8501- Fork this repository to your GitHub account
- Sign up at Streamlit Cloud
- Connect your GitHub repository
- Click "Deploy app"
- Live app: Healthcare ML Dashboard
- Railway: Simple deployment with GitHub integration
- Render: Free tier available for small projects
- Google Cloud Platform: Professional deployment option
- AWS: Enterprise-level deployment solution
-
Select Model from sidebar:
- Breast Cancer Classification
- Diabetes Risk Analysis
- Data Exploration
-
Input Data using interactive sliders:
- Adjust values based on patient measurements
- See real-time updates
-
Get Predictions:
- Click prediction buttons
- View confidence scores and risk levels
- Analyze visualization charts
-
Explore Data:
- Review dataset statistics
- Examine feature correlations
- Compare model performances
# Add images to resources folder
resources/
├── header_image.jpg # Main dashboard header
├── breast_cancer_model.jpg # Breast cancer section
├── diabetes_model.jpg # Diabetes section
└── data_exploration.jpg # Data exploration section- Edit
train_breast_cancer_model()for different features - Adjust
train_diabetes_model()for different algorithms - Update visualization functions for custom charts
- Scikit-learn: Data preprocessing and metrics
- XGBoost: Gradient boosting models
- Pandas: Data manipulation
- NumPy: Numerical operations
- Plotly: Interactive charts
- Matplotlib: Static plots
- Seaborn: Statistical visualizations
- Streamlit: Web application framework
- PIL (Pillow): Image processing
- Jupyter: Interactive development
- Git: Version control
- Accuracy: ~95%
- Precision: High precision for both classes
- Recall: Balanced recall scores
- Features: 5 most important features selected
- R² Score: ~0.5
- MSE: Reasonable prediction error
- Feature Importance: All 10 features contribute
- Range: Risk scores from 25-346
-
Import Errors
# Ensure all packages are installed pip install -r requirements.txt -
Port Already in Use
# Use different port streamlit run healthcare_app.py --server.port 8502 -
Image Loading Issues
# Check resources folder exists # Verify image file names match exactly
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Live Application:
- Primary: Healthcare ML Dashboard - Streamlit Cloud Deployment
- Repository: GitHub Repository - Source Code
Deployment Status:
This project is for educational purposes. Please ensure compliance with healthcare data regulations in production use.
- Scikit-learn for providing healthcare datasets
- Streamlit for the web framework
- XGBoost for high-performance machine learning
- Plotly for interactive visualizations
Important Note: This application is for educational and demonstration purposes only. Always consult qualified healthcare professionals for medical advice and diagnosis.
For questions or issues:
- Check the troubleshooting section
- Review the code documentation
- Open an issue on the repository
- Consult the official documentation for used libraries
Happy Healthcare Analytics!