A complete machine learning system that predicts customer churn, explains why customers are leaving using SHAP (Explainable AI), and recommends business actions to retain them.
- Project Overview
- How It Works
- Project Architecture
- Dataset Description
- Machine Learning Models
- SHAP Explainability
- Action Recommendation System
- Installation Guide
- How to Run
- Sample Output
- Technologies Used
Customer churn refers to when customers stop using a company's product or service. For subscription-based businesses like Netflix, Spotify, Amazon Prime, or telecom operators, churn directly impacts revenue.
This AI system provides three core capabilities:
| Feature | Description |
|---|---|
| 1. Churn Prediction | Predicts whether a customer will leave (churn) or stay |
| 2. Explainability | Explains WHY the AI made that prediction using SHAP values |
| 3. Action Recommendations | Suggests specific business actions to retain at-risk customers |
- Companies lose millions due to customer churn
- Early identification of at-risk customers enables proactive retention
- Understanding WHY customers leave helps improve business strategies
- Automated recommendations enable fast, data-driven decisions
The diagram below represents the complete workflow of the system, starting from data collection and preprocessing to the final prediction and action recommendation.
┌─────────────────────────────────────────────────────────────────────────┐
│ SYSTEM WORKFLOW │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. DATA INPUT │
│ ┌──────────────────┐ │
│ │ Customer Data │ age, gender, subscription, login frequency, │
│ │ (10 features) │ last login days, watch time, payment failures, │
│ │ │ support calls, tenure, monthly charges │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ 2. PREPROCESSING │
│ ┌──────────────────┐ │
│ │ Encode & Scale │ Convert text to numbers, normalize values │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ 3. ML PREDICTION │
│ ┌──────────────────┐ │
│ │ Trained Model │ Predicts churn probability (0-100%) │
│ │ (Random Forest) │ Risk Level: HIGH / MODERATE / LOW │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ 4. SHAP EXPLANATION │
│ ┌──────────────────┐ │
│ │ Feature Impact │ "Low login frequency increased churn by 25%" │
│ │ Analysis │ "Payment failures increased churn by 18%" │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ 5. ACTION RECOMMENDATION │
│ ┌──────────────────┐ │
│ │ Business Rules │ "Offer 20% discount" │
│ │ Engine │ "Send re-engagement email" │
│ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
This block diagram provides a high-level overview of the system's architecture, showcasing the interaction between the data processing, model training, explanation, and recommendation modules.
customer_churn_project/
│
├── 📂 data/
│ ├── generate_data.py # Creates synthetic customer dataset
│ └── customers.csv # Generated dataset (5000 customers)
│
├── 📂 model/
│ ├── churn_model.pkl # Trained machine learning model
│ ├── encoder.pkl # Feature encoders (gender, subscription)
│ └── feature_names.pkl # List of feature column names
│
├── 📂 src/
│ ├── train_model.py # Trains and compares 3 ML algorithms
│ ├── predict.py # Makes churn predictions
│ ├── explain.py # SHAP-based explainability
│ └── recommend.py # Action recommendation engine
│
├── 📂 ui/
│ └── app.py # Tkinter desktop application
│
└── README.md # This documentation file
| File | Purpose |
|---|---|
data/generate_data.py |
Generates 5000 synthetic customers with realistic churn patterns. Uses probability-based logic where low engagement, payment failures, and high support calls correlate with higher churn. |
data/customers.csv |
The training dataset with 10 features and 1 target variable (churn). Contains ~40% churned customers. |
src/train_model.py |
Loads data, preprocesses features, trains Logistic Regression, Naive Bayes, and Random Forest. Compares accuracy and saves the best model. |
src/predict.py |
Loads the saved model and makes predictions on new customer data. Returns probability and risk level. |
src/explain.py |
Uses SHAP (SHapley Additive exPlanations) to calculate feature importance for each prediction. |
src/recommend.py |
Rule-based engine that suggests actions based on churn probability and specific customer factors. |
ui/app.py |
Desktop GUI built with Tkinter. Allows entering customer data and displays prediction + explanation + recommendations. |
| Feature | Type | Description | Range |
|---|---|---|---|
age |
Numeric | Customer's age | 18-70 years |
gender |
Categorical | Male or Female | Male/Female |
subscription_type |
Categorical | Subscription tier | Basic/Standard/Premium |
monthly_charges |
Numeric | Monthly subscription cost | $9.99-$39.99 |
tenure_in_months |
Numeric | How long they've been a customer | 1-72 months |
login_frequency |
Numeric | Number of logins per month | 0-60 logins |
last_login_days |
Numeric | Days since last login | 0-90 days |
watch_time |
Numeric | Hours of content watched per month | 0-100 hours |
payment_failures |
Numeric | Number of failed payment attempts | 0-5 failures |
customer_support_calls |
Numeric | Support tickets raised | 0-10 calls |
| Variable | Values | Meaning |
|---|---|---|
churn |
0 | Customer will STAY |
churn |
1 | Customer will LEAVE (churn) |
The synthetic data generator creates realistic correlations:
- Low login frequency → Higher churn probability
- Many days since last login → Higher churn probability
- Low watch time → Higher churn probability
- Payment failures → Strongly increases churn probability
- Many support calls → Indicates frustration, higher churn
- Short tenure → New customers churn more
- Premium subscription → Lower churn (more committed)
| Algorithm | Description | Strengths |
|---|---|---|
| Logistic Regression | Linear model for binary classification | Fast, interpretable, good baseline |
| Naive Bayes | Probabilistic classifier | Fast training, works with small data |
| Random Forest | Ensemble of decision trees | High accuracy, handles non-linear patterns |
- Load Data: Read
customers.csv - Encode Categoricals: Convert gender and subscription_type to numbers
- Scale Features: Normalize all values to same range (StandardScaler)
- Split Data: 80% training, 20% testing
- Train Models: Fit all 3 algorithms
- Evaluate: Calculate accuracy, precision, recall, F1-score
- Save Best: Store the best performing model as
.pklfile
Model Accuracy Precision Recall F1-Score
-----------------------------------------------------------------
Logistic Regression 70.4% 67.3% 48.9% 56.6%
Naive Bayes 68.2% 64.1% 44.3% 52.4%
Random Forest 75.3% 72.2% 61.0% 66.1% [SELECTED]
SHAP (SHapley Additive exPlanations) is an explainable AI technique that:
- Calculates the contribution of each feature to the prediction
- Shows which features pushed the prediction toward churn or stay
- Provides both global (overall) and local (individual) explanations
For each customer prediction, SHAP tells us:
- Positive SHAP value: This feature INCREASES churn probability
- Negative SHAP value: This feature DECREASES churn probability
Customer: John, 25 years old, Basic subscription, 2 months tenure
Top Factors Influencing Churn:
1. Last Login Days: 45 days → INCREASES churn risk by 25%
2. Payment Failures: 2 failures → INCREASES churn risk by 18%
3. Login Frequency: 3 logins/month → INCREASES churn risk by 12%
4. Watch Time: 2.5 hours → INCREASES churn risk by 10%
5. Tenure: 2 months → INCREASES churn risk by 8%
The system uses probability thresholds to categorize risk and suggest actions:
| Churn Probability | Risk Level | Urgency | Actions |
|---|---|---|---|
| ≥ 70% | HIGH | URGENT | Offer 20-30% discount, assign account manager, send personalized retention message |
| 40-70% | MODERATE | MODERATE | Send re-engagement notification, offer 10-15% discount, highlight new features |
| < 40% | LOW | LOW | Continue regular engagement, include in loyalty program |
The system also provides targeted actions based on specific issues:
| Factor Issue | Recommended Action |
|---|---|
| Many days since login | Send "We miss you" email with exclusive content |
| Low login frequency | Recommend personalized content |
| Low watch time | Send curated content recommendations |
| Payment failures | Reach out to resolve payment issues |
| Many support calls | Proactive outreach to resolve ongoing issues |
| Short tenure | Onboarding follow-up |
- Python 3.8 or higher
- pip (Python package manager)
git clone https://github.com/your-username/customer-churn-prediction.git
cd customer-churn-predictionpip install pandas numpy scikit-learn shap matplotlib joblib| Library | Version | Purpose |
|---|---|---|
| pandas | ≥1.3.0 | Data manipulation |
| numpy | ≥1.21.0 | Numerical operations |
| scikit-learn | ≥1.0.0 | Machine learning |
| shap | ≥0.40.0 | Explainability |
| matplotlib | ≥3.4.0 | Visualizations |
| joblib | ≥1.1.0 | Model saving/loading |
| tkinter | (built-in) | Desktop UI |
# Step 1: Generate the dataset
python data/generate_data.py
# Step 2: Train the model
python src/train_model.py
# Step 3: Launch the UI
python ui/app.py# Test prediction module
python src/predict.py
# Test recommendation module
python src/recommend.py
# Test explainability module
python src/explain.py- Launch with
python ui/app.py - Enter customer data in the input fields
- OR click "Load High-Risk Sample" / "Load Low-Risk Sample"
- Click "Predict Churn"
- View results: probability, risk level, explanations, and recommendations
Input:
- Age: 25, Gender: Male, Subscription: Basic
- Monthly Charges: $12.99, Tenure: 2 months
- Login Frequency: 3/month, Last Login: 45 days ago
- Watch Time: 2.5 hours, Payment Failures: 2
- Support Calls: 4
Output:
======================================================================
CHURN PREDICTION RESULTS
======================================================================
CHURN PROBABILITY: 94.0%
PREDICTION: Churn
RISK LEVEL: HIGH
Risk Meter: [###############################################---] 94%
----------------------------------------------------------------------
TOP FACTORS INFLUENCING CHURN:
----------------------------------------------------------------------
1. Days Since Last Login: 45 days
-> INCREASES churn risk by 25.0%
2. Payment Failures: 2
-> INCREASES churn risk by 18.0%
3. Login Frequency: 3
-> INCREASES churn risk by 12.0%
----------------------------------------------------------------------
RECOMMENDED ACTIONS:
----------------------------------------------------------------------
Status: ALERT - 94% probability of churning. Immediate action required!
Primary Action: Immediate retention intervention required
Specific Actions:
1. Reach out to resolve payment issues
2. Send "We miss you" email with exclusive content
3. Offer personalized discount (20-30% off)
4. Recommend personalized content based on past preferences
5. Assign dedicated account manager
======================================================================
| Category | Technology |
|---|---|
| Programming Language | Python 3.x |
| Data Processing | pandas, numpy |
| Machine Learning | scikit-learn |
| Explainability | SHAP |
| Visualization | matplotlib, seaborn |
| Model Persistence | joblib |
| Desktop UI | Tkinter |
| Future UI | CustomTkinter / PyQt |
This project is developed for educational and academic purposes.
Blake whirlow
- SHAP library for explainable AI
- scikit-learn for machine learning algorithms
- The open-source Python community
Last Updated: January 2026


