Skip to content

Agungvpzz/Survival-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 

Repository files navigation

If you encounter an error with the Jupyter Notebook on GitHub, please use the following links below:
Survival Analysis Notebook

Telco-Churn-Survival-Analysis

A. Introduction

In this repository, I will conduct survival analysis using Python, utilizing Plotly for interactive data visualization. The analysis will include exploratory data analysis, survival function estimation using the Kaplan-Meier method, and hazard modeling with the Cox Proportional Hazards model via CoxPHFitter from the lifelines package.

B. Business Understanding

1. Business Goals

The primary goal of this analysis is to understand customer behavior over time and predict the likelihood of churn. By leveraging survival analysis techniques, businesses can gain insights into customer retention patterns, optimize engagement strategies, and make data-driven decisions to maximize long-term customer value.

2. Objective of this analysis

  1. Estimate customer retention over time using survival analysis.
  2. Identify key drivers of churn through hazard modeling.
  3. Provide actionable insights to inform retention strategies.

3. Key Questions to Answer

  1. How long do customers typically stay, and when are they most likely to churn?
  2. What factors most strongly influence churn risk?
  3. How can these insights be used to improve customer retention?

C. Data Understanding

  • The dataset can be explored and downloaded with the following link telco-customer-churn.
  • Learn more about the dataset through this link Legend

D. Methodology

1. Exploratory Data Analysis (EDA)

  1. Visualizing churn composition using a pie chart.
  2. Visualizing churn composition and association (with chi-squared tests) across categorical features using a bar chart grouped by variable and stacked by churn status.
  3. Visualizing churn distribution across tenure using a stacked bar chart.
  4. Visualizing churn distribution across tenure under different conditions using multiple stacked bar charts.
  5. Visualizing the Pearson correlation coefficient between churn and each feature using a bar chart.
  6. Visualizing unique combinations of significant categorical features using a parallel categories chart.

2. Survival Function Estimation using the Kaplan-Meier Method (Non-Parametric)

  1. Visualizing the survival function curve and its confidence interval using a line chart.
  2. Stratify survival curves by categorical variables to compare retention patterns.

3. Hazard Modeling with the Cox Proportional Hazards Model (Semi-Parametric)

  1. Data Preparation
    • Drop features whose χ² p-values indicate no significant association with churn.
    • Encode categorical variables using rank-based target mean encoding.
    • Split the data with Surv.from_dataframe, reserving 30 % for testing and stratifying on churn to keep class balance.
  2. Feature Engineering
    • Apply quantile binning to continuous variables (e.g., TotalCharges).
  3. Model Training
    • Fit a Cox Proportional Hazards model using default hyper-parameters.
  4. Model Evaluation
    • Assess summary stats: coefficients, hazard ratios, p-values.
    • Compute:
      • Standard Concordance Index (C-Index)
      • Censored C-Index
      • Cumulative Dynamic AUC over tenure
  5. Model Comparison
    • Contrast results from the lifelines and scikit-survival implementations.
    • Benchmark predictive performance across alternative feature sets.
  6. Model Selection
    • Select the best-performing configuration.
  7. Model Visualisation
    • Plot coefficient bars with 95% CIs.
    • Plot covariate partial effects to show how within-group changes shift the survival curve (line chart).
    • Draw time-dependent ROC curves to illustrate accuracy over time.
    • Plot survival curves stratified by hazard-risk quartiles to display risk separation.
    • Overlay predicted vs. observed time-to-event curves to assess calibration.

4. Tools and Libraries

  • Tools: Python, JupyterLab, Git, GitHub.
  • Python Libraries: lifelines, scikit-survival, scikit-learn, plotly, pandas, numpy, streamlit, scipy.

E. Exploratory Data Analysis

1. Churn Composition

Chrun Composition
The pie chart shows that 26.5% (1869) of our customers have churned.

2. Churn Composition Across Categorical Features

Churn Composition Across Categorical Features

  • Sorted by chi-square (χ²) values, the bar chart highlights how churn composition varies across each categorical feature.
  • Almost all features, as shown in the chart, demonstrate significant differences in churn composition based on their chi² values, except for Gender and PhoneService, which show weak or negligible associations.
  • Moreover, we highlight several insights based on feature groupings below:
    • Socio-Demographic Features:
      • Senior Citizen: Senior customers (age ≥ 65) have a 41.68% churn rate, almost twice as high as non-seniors (23.61%).
      • Dependents: Customers without dependents have a 31.28% churn rate, nearly twice as high as those with dependents (15.45%).
      • Partner: Customers with partners have a 32.96% churn rate, notably higher than those who are single (19.66%).
    • Payment Features:
      • Contract: Customers on monthly contracts have a 42.7% churn rate, significantly higher than other types, and represent the largest customer group (55%).
      • Payment Method: Those using electronic check as a payment method show a 45.29% churn rate, nearly three times higher than other methods.
      • PaperlessBilling: Customers enrolled in paperless billing have a 33.57% churn rate, more than double that of those who are not (16.33%).
    • Service Features:
      • InternetService: Customers with internet service (either fiber optic or DSL) show significantly higher churn compared to those without. Specifically, fiber optic customers have a 41.89% churn rate, more than twice that of DSL customers (18.96%), and nearly six times that of customers without internet service.
        • This distinction between internet customers and non-customers also impacts related features: OnlineSecurity, OnlineBackup, DeviceProtection, TechSupport, StreamingTV, and StreamingMovies.
      • OnlineSecurity: Customers without this service experience a 41.77% churn rate, nearly three times higher than subscribers.
      • TechSupport: Those lacking tech support show a 41.64% churn rate, almost triple that of customers with support.
      • OnlineBackup: Customers not using online backup have a 39.93% churn rate, nearly twice as high as customers who do.

3. Churn Distribution Across Tenure Periods

Churn Distribution Across Tenure Periods

  • The bar chart reveals a sharp spike in churn (around 380 customers) early in the first month. This suggests that many customers leave shortly after joining, possibly due to onboarding issues, unmet expectations, or a mismatch between the product and customer needs.
  • After the initial drop-off, churn steadily decreases through the end of year 1, then remains stable up to year 5, indicating that customers who remain beyond the first few months are more likely to stay loyal.
  • After year 5, the chart shows a gradual increase in the number of customers who have stayed for more than five years, with a big jump in the last three time periods, especially the final one. This suggests that the earliest customers, especially those from the first three months who have been engaged for approximately 70 tenure periods, tend to be the most loyal.

4. Churn Distribution Across Tenure Periods under Different Conditions

Example Feature: Contract

Churn Distribution Across Tenure Periods under Different Conditions

  • The bar charts clearly illustrate how churn composition varies across customer tenure for each contract type.
  • For Month-to-month contracts, the majority of churn occurs within the first year, particularly in the first month. Churn then drops sharply until month six, followed by a steady decline. Very few customers remain active beyond five years.
  • The One-year contract displays a relatively uniform distribution of churn across tenure, suggesting consistent retention dynamics throughout the contract period.
  • The Two-year contract shows a higher proportion of long-term customers, although this concentration only becomes apparent after year 5. This suggests strong loyalty among existing subscribers. However, the sharp drop in newer customers on this contract type, particularly after the first three months of engagement, may signal a shift in preference toward more flexible, short-term plans, potentially posing challenges for future retention.

5. Pearson Correlation Coefficient Between Churn and each Feature

Pearson Correlation Coefficient Between Churn and each Feature

  • The bar chart shows that MonthlyCharges has a mild positive correlation (~0.20) with churn. This likely reflects pricing differences, as short-term contracts, linked to higher churn, often come with higher monthly costs.
  • TotalCharges shows a negative correlation with churn, as higher spending typically reflects longer tenure. This suggests that these customers are either still active or have remained active longer than those who churned.
  • Together, MonthlyCharges, TotalCharges, and Contract capture overlapping aspects of tenure and pricing. While not a direct case of data leakage, their interdependence warrants caution, especially with multicollinearity in modeling.

6. Combination of Significant Categorical Features

a. Socio-Demographic Features

Socio-Demographic Features Socio-Demographic Features Table

  • Customers who have a partner, are not senior citizens, and live with dependents have the lowest churn rate, at only 13.75%.
  • Customers who have no partner, are senior citizens, and have no dependents exhibit the highest churn rate, approaching 50%.

b. Service Features

Service Features Service Features Table

  • Customers who subscribe to DSL internet service along with both OnlineBackup and OnlineSecurity have the lowest churn rate at 6.75%.
    • However, when they do not subscribe to these services (OnlineBackup and OnlineSecurity), the churn rate increases significantly to 34.6%.
  • Customers without internet service have a churn rate of 7.4%.
  • Customers who subscribe only to Fiber Optic internet service, without OnlineBackup and OnlineSecurity, have the highest churn rate at 55.8%.

c. Payment Features

Payment Features Payment Features Table

  • Customers with a two-year contract generally have a churn rate of only 2.83%.
    • Among them, those who use Mailed Check as their payment method and opt for manual (non-paperless) billing have the lowest churn rate at just 0.37%.
    • Conversely, the highest churn rate within this group is 9.82% for customers who use Electronic Check with paperless billing.
  • Customers who use Electronic Check as their payment method with paperless billing experience a churn rate that is twice as high across different contract types.
  • About 42% of customers with a monthly contract have churned, with the highest churn rate at 57.7% for those who use Electronic Check with paperless billing.

F. Survival Function Estimation using the Kaplan-Meier Method

1. Survival Function Curve

Survival Function Curve
Survival Function Curve Report

  • The survival curve reveals Three key patterns in customer retention
  • Early Churn (First Month):
    • The steepest drop in the survival curve, about 5%, occurs within the first month of customer engagement.
    • Approximately 20% of all churned customers leave during this initial period.
    • This suggests that a substantial segment of customers are evaluating the service and decide to discontinue early, likely due to unmet expectations or poor initial experience.
  • High-Risk Period (Month 1 to Month 12):
    • From month 1 through month 12, the survival probability declines further from ~95% to ~84%, a drop of over 10 percentage points.
    • During this period, about 657 customers, roughly 35% of all churned customers, choose to exit the service.
    • This one-year window is a critical phase for customer retention efforts, as 55% of churned customers leave before forming long-term habits or commitments.
  • Gradual Decline (After Month 12):
    • After the first year, the survival curve continues a slow but steady descent.
    • Annual churn becomes more gradual, with ~5% decrease in survival probability per year.
    • In total, the distribution of churned customers across each year is approximately:
      • Year 2: 16% of all churned customers occurred (corresponding to a ~5 percentage point drop in survival curve)
      • Year 3: 9% of churned customers (~4% drop in survival)
      • Year 4: 8% (~4% drop)
      • Year 5: 7% (~5% drop)
      • Year 6: 5% (~7% drop)
    • This pattern reflects ongoing but less concentrated attrition, likely driven by long-term changes in customer needs, emerging alternatives, or a gradual decline in perceived value.

2. Survival Curves Across the Unique Values of each Categorical Feature

a. Socio-Demographic Features

Socio-Demographic Features
Socio-Demographic Features-Times
Observed Churn (%)

  • Among all socio-demographic features, only Gender did not reach statistical significance.
  • Final Survival (%) Below 50%:
    • Partner | No: 46.32% (drops below 50% at the 68th tenure period)
    • SeniorCitizen | 1: 42.13% (drops below 50% at the 65th tenure period)
  • Marked Differences in Mean Survival Drop by Tenure Across Categorical Values (%):
    • Dependents: 0.66% vs 0.34% (Dependents | No vs Dependents | Yes)
    • Partner: 0.74% vs 0.42% (Partner | No vs Partner | Yes)
    • SeniorCitizen: 0.79% vs 0.50% (SeniorCitizen | 1 vs SeniorCitizen | 0)
  • Churn Concentration (Average Across Different Categorical Features):
    • In the first year of tenure, an average of 52.6% of churned customers leave.
    • By 24.0 tenure points (2 years), more than half (68.78%) of churned customers have left.
  • Notable Insights:
    • Partner: First-month churn difference between Partner | No (26.17%) and Partner | Yes (9.87%)
    • All demographic features, except Gender, exhibit approximately a 20% difference in final survival rates across their unique values.

b. Product/Service Features

Service Features Service Features-Times
Observed Churn (%)

  • All product and service features, except PhoneService, show a significant p-value.
  • Final Survival (%) Below 50%:
    • InternetService | Fiber optic: 41.70% (drops below 50% at 65th tenure period)
    • OnlineBackup | No: 39.23% (drops below 50% at 52th tenure period)
    • OnlineSecurity | No: 33.09% (drops below 50% at 53thtenure period)
    • TechSupport | No: 34.92% (drops below 50% at 53th tenure period)
  • Marked Differences in Mean Survival Drop by Tenure Across Categorical Values (%):
    • InternetService: 0.80% vs 0.13% (InternetService | Fiber optic vs InternetService | No)
    • OnlineBackup: 0.83% vs 0.45% (OnlineBackup | No vs OnlineBackup | Yes)
    • OnlineSecurity: 0.92% vs 0.31% (OnlineSecurity | No vs OnlineSecurity | Yes)
    • TechSupport: 0.89% vs 0.33% (TechSupport | No vs TechSupport | Yes)
  • Churn Concentration (Average Across Different Categorical Features):
    • In the first year of tenure, an average of 59.8% of churned customers leave.
    • By 24.0 tenure points (2 years), more than half (73.4%) of churned customers have left.
      • At this tenure point, only 46.85% of churned customers under OnlineBackup | Yes had exited, indicating delayed churn compared to other groups.
      • Meanwhile, under InternetService | No, approximately 90% of churned customers had already left by this point in tenure.
  • Notable Insights:
    • 51.3% of churned customers under InternetService | No leave within the first month (one month after subscribing).
    • Despite this early churn, customers under InternetService | No retain a final survival rate of 90% by the end of the tenure period.

c. Payment Features

Payment Features Payment Features-Times
Observed Churn (%)

  • All payment features, especially Contract, show a significant p-value.
  • Final Survival (%) Below 50%:
    • Contract | Month-to-month: 12.90% (drops below 50% at 35th tenure period)
    • PaymentMethod | Electronic check: 29.45% (drops below 50% at 47th tenure period)
  • Marked Differences in Mean Survival Drop by Tenure Across Categorical Values (%):
    • Contract: 1.19% vs 0.09% (Contract | Month-to-month vs Contract | Two year)
    • PaperlessBilling: 0.67% vs 0.36% (PaperlessBilling | Yes vs PaperlessBilling | No)
    • PaymentMethod: 0.97% vs 0.33% (PaymentMethod | Electronic check vs PaymentMethod | Credit card (automatic))
  • Churn Concentration (Average Across Different Categorical Features):
    • In the first year of tenure, an average of 44.1% of churned customers leave.
      • Customers with one- or two-year contracts tend to stick to their agreements, with under 10% of churned customers.
      • 80.19% of churned customers using PaymentMethod | Mailed check have already left.
    • By the 24th tenure period (2 years), more than half (56.7%) of churned customers have left.
      • Even after 2 years, no customer under Contract | Two year has churned.
      • Meanwhile, around 17.5% of churned customers under Contract | One year have left.
      • Shockingly, about 90% of churned customers using PaymentMethod | Mailed check have churned by this point.
  • Notable Insights:
    • 41.88% of churned customers using PaymentMethod | Mailed check leave within the first month (one month after subscribing).
    • Contract | Two year shows the highest final survival rate at 93.57%, meanwhile the Contract | Month-to-month has the lowest at 12.9%.

G. Hazard Modeling Using Cox Proportional Hazards

1. Model Evaluation

Model Evaluation

The survival model exhibits strong predictive performance with minimal overfitting, as shown by the following metrics:

  • Concordance Index (C-Index) & C-IndexC (Censored):
    • Train: 0.9464
    • Test: 0.9414
    • High agreement between predicted and actual survival rankings, including censored cases.
  • Cumulative Dynamic AUC:
    • Train: 0.9730
    • Test: 0.9711
    • Excellent discriminatory ability in time-dependent survival probability estimation.
  • The close alignment between train and test results highlights good generalization and model robustness for survival prediction.

2. Model Comparison

Model Comparison

  • CoxPHFitter, CoxPHSurvivalAnalysis: Include all columns.
  • cph2, cphsk2: Exclude columns with an insignificant p-value (> 0.05).
  • cph3, cphsk3: Exclude columns with an insignificant logp value (< 10).
  • As shown in the comparison table above, the differences between the models and their combined feature sets are not significant. Therefore, we can select any of them, or strategically opt for the option with the least number of predictors to reduce computational load.

3. Model Selection

  • After several experiments to identify the optimal combination of predictor sets, we found that Contract, InternetService, TotalCharges, and TotalCharges (Q) form the most effective combination, as shown in the table below.

Optimized Predictors

  • These predictor sets yield higher performance across all evaluation metrics.
  • Concordance Index (C-Index) & C-IndexC (Censored):
    • Train: 0.9464 -> 0.9565
    • Test: 0.9414 -> 0.9526
  • Cumulative Dynamic AUC:
    • Train: 0.9730 -> 0.9788
    • Test: 0.9711 -> 0.9773

4. Model Summary

image

  • Since we encoded our categorical variables using rank‐mean target encoding (rank 1 = lowest churn), we assume a linear effect across ranks.
  • All predictors are statistically significant (p ≈ 0.0).
  • TotalCharges (Q):
    • coef = –1.787, HR = exp(–1.787) = 0.168
    • A one‐quantile increase in TotalCharges (e.g. Q1 → Q2) reduces the hazard by ~83.2% (1 – 0.168).
  • TotalCharges:
    • coef = –0.001, HR = exp(–0.001) = 0.999
    • Each extra dollar in TotalCharges cuts the churn hazard by ≈ 0.1%.
  • Contract (0=Two-year → 1=One-year → 2=Month-to-month):
    • coef = 1.578, HR = exp(1.578) = 4.846
    • A one-rank step (e.g. Two-year → One-year) multiplies churn risk by ~4.85 times.
    • Moving from Two-year (0) to Month-to-month (2) (Δ=2 ranks) multiplies hazard by exp(1.578 times 2) ≈ 23.5 times.
  • InternetService (0=No → 1=DSL → 2=Fiber):
    • coef = 2.006, HR = exp(2.006) = 7.430
    • A one-rank step (No → DSL) multiplies hazard by ~7.43 times.
    • Going from No Internet (0) to Fiber (2) multiplies hazard by exp(2.006 times 2) ≈ 55 times.

5. Model Visualization

a. Feature Coefficients

Feature Coefficients

  • The floating bar chart above illustrates each covariate’s coefficient alongside its confidence interval.
  • Among all variables, TotalCharges (Q), Contract, and InternetService apparently emerge as the strongest predictors of churn risk. Yet, these categorical variables influence the hazard function in discrete steps, causing distinct jumps in the predicted risk, which often result in larger coefficient magnitudes.
  • Meanwhile, TotalCharges, being continuous, contributes to a smoother, more gradual shift in risk, and its coefficient tends to be small, reflecting the incremental effect of each additional dollar on churn risk.

b. Time-Dependent ROC Curve

Time-Dependent ROC Curve

  • High initial AUCs: Both training and testing curves start high (~0.95), indicating strong early predictive performance.
  • Stable Mid-range Performance (Tenure 5–40): The AUC remains very high (around 0.99), suggesting the model is performing exceptionally well in this middle range of time.
  • Degradation Over Time (Post-40 Tenure): AUC values for both train and test begin to decline gradually, with a noticeable drop after around tenure 60. This could be due to fewer samples available at longer tenures, or that model generalization weakens over time.
  • Train vs Test Consistency: The test curve closely follows the train curve, indicating good generalization and low overfitting.

c. Covariate Partial Effects

Example Covariate: InternetService

InternetService

  • As shown in the chart above, survival curves differ significantly across InternetService types.
  • The survival curve for Fiber optic declines steadily from the start of the tenure period, falling below 50% after 26 months.
  • After three years, the survival rate for Fiber optic drops to 21%, whereas DSL and No InternetService retain higher rates of 81% and 97%, respectively.
  • By year four, the survival rate for Fiber optic falls to 0%, DSL declines sharply to 46.5%, while No InternetService still maintains a rate above 90%.
  • By year five, only the No InternetService category continues to retain a substantial portion of customers, with a survival rate of 68.4%.

d. Survival Function Curve Based on Hazard Quartiles

Hazard Quartiles

  • The survival curves clearly demonstrate how well the hazard model separates customers based on quartile-based risk levels.
  • Each curve successfully distinguishes churned from non-churned customers, with the churned group appearing lower and the non-churned group higher, as expected.

e. Survival Function Curve for Time-to-Event Predictions

Time-to-Event Predictions

  • Using customer samples, as shown in the chart above, the model captures several churn events with good accuracy.

H. Conclusions

1. Exploratory Data Analysis

  • Churn Rate: Approximately 26.5% (1869) of our customers have churned.
  • Statistical Associations: All categorical features, except for Gender and PhoneService, show a statistically significant association with Churn.
  • Tenure and Contract Type Insights:
    • The Month-to-month contract type exhibits the highest churn rate, particularly during the first month. Given that this plan is used by the majority of our customers, it underscores the company's vulnerability if it relies solely on short-term contracts without effective customer retention strategies.
    • The Two-year contract is associated with a higher proportion of customers remaining active for over five years. However, the lower number of newer customers on this contract might indicate either a shift in customer preference or the impact of targeted marketing strategies that encourage a move toward short-term contracts.
  • Service Features Insights:
    • Customers without internet service demonstrate higher loyalty, with a churn rate of only 7.4%. In contrast, Fiber Optic and DSL customers exhibit churn rates of 41.9% and 19.0%, respectively.
    • Among customers with internet service, those who do not subscribe to any additional services, such as OnlineSecurity, TechSupport, OnlineBackup, or DeviceProtection, experience a significantly higher churn rate of approximately 40%, twice as high as those who subscribe to at least one of these services.
  • Socio-Demographc Features Insights:
    • Customers without dependents (Dependents = No) have a churn rate of 31.3%, which is twice as high as those with dependents (15.5%).
    • Customers classified as Senior Citizens (SeniorCitizen = 1, i.e., aged 65 or older) have a churn rate of 41.7%, nearly double that of non-senior customers (23.6%).
    • Customers without a partner (Partner = No) exhibit a 33.0% churn rate, notably higher than those with a partner (19.7%).
  • Payment Features Insights:
    • Customers who use the electronic check payment method have a churn rate of 45.29%, which is twice as high as those using other payment methods.
    • Customers enrolled in paperless billing have a churn rate of 33.57%, roughly double that of customers who are not (16.33%).

2. Survival Function Estimation

  • Final Survival (%): 59.28% at the final tenure period.
  • Average Survival Drop per Tenure (%): -0.56% per tenure.
  • Survival Curve Highlights Three Distinct Retention Phases:
    • a sharp early churn in the first month,
    • a high-risk churn window within the first year, and
    • a gradual decline in customer retention thereafter.
    • Notably, around 55% of churn occurs within the first year, highlighting the importance of early engagement strategies.
  • Survival Function Across socio-demographic features:
    • Attributes like having a partner, not a senior citizen, or having dependents show higher survival rates, with Gender being the only statistically insignificant factor.
    • Customers without partners or dependents, and senior citizens, exhibit lower survival rates, suggesting these groups may need more targeted retention strategies.
  • Survival Function Across Product and Service Features:
    • The absence of support services such as OnlineSecurity, OnlineBackup, and TechSupport is strongly associated with lower survival rates.
    • Notably, customers using InternetService | No show early dropout but maintain a high final survival rate, highlighting potentially different usage intentions or expectations.
    • These findings suggest that bundling essential services or improving onboarding for feature adoption could positively impact retention.
  • Survival Function Across Product and Payment-Related Features:
    • Customers on month-to-month contracts or those using Electronic check or Mailed check as payment methods show significantly lower survival rates.
    • In contrast, long-term contract customers, particularly those on Contract | Two year, maintain the highest survival rates.
    • These patterns reinforce the protective effect of long-term agreements and automated payment methods on customer retention.

3. Hazard Modeling

  • The hazard modeling analysis using the Cox Proportional Hazards model reveals a robust and well-calibrated predictive framework for understanding customer churn risk over time.
  • Best Predictors:
    • Contract: coef = 1.578, HR = exp(1.578) = 4.846
    • InternetService: coef = 2.006, HR = exp(2.006) = 7.430
    • TotalCharges: coef = –0.001, HR = exp(–0.001) = 0.999
    • TotalCharges (Q): coef = –1.787, HR = exp(–1.787) = 0.168
  • Model Scores:
    • Concordance Index (C-Index) & C-IndexC (Censored):
      • Train: 0.9565
      • Test: 0.9526
    • Cumulative Dynamic AUC:
      • Train: 0.9788
      • Test: 0.9773
    • Both the Concordance Index and cumulative dynamic AUC reached values above 0.95, demonstrating high discriminatory power and strong predictive accuracy throughout the customer tenure period.
  • Feature Importance:
    • Contract type and internet service type are the dominant categorical predictors, exerting large, discrete impacts on churn risk.
      • For instance, transitioning from a two-year to a month-to-month contract can increase churn risk by over 23 times, while moving from no internet to fiber optic service may increase the hazard by over 55 times, emphasizing the behavioral sensitivity associated with short-term contracts and high-speed services.
    • On the continuous side, TotalCharges (both raw and quantile-transformed) shows significant yet smoother effects on churn probability.
      • Specifically, a one-quantile increase in TotalCharges reduces the hazard by over 83%, while every additional dollar spent reduces churn risk by approximately 0.1%, reinforcing the protective nature of higher spending or longer tenure.
    • These findings can directly inform targeted retention strategies, such as offering incentives for long-term contracts or interventions for high-risk internet service customers.
  • Visual Analysis:
    • Visual diagnostics further support the model’s validity.
    • The time-dependent ROC curves show consistently high AUC values (~0.95–0.99) up to mid-tenure, with minimal degradation over time and excellent generalization across training and test sets.
    • The partial effects plots clearly illustrate that customers using fiber optic services experience far more rapid churn compared to DSL or those without internet service.
    • Lastly, quartile-based survival functions cleanly stratify risk groups, confirming that the model effectively separates high-risk and low-risk customers.

I. Recommendations

  1. Shift Away from Month-to-Month Contracts
    • Issue: Customers on month-to-month contracts are 5 to 23 times more likely to churn compared to those on long-term plans.
    • Recommendation: Promote annual or two-year plans through discounts, loyalty rewards, or early-lock-in benefits to improve customer retention.
  2. Target High-Risk Internet Service Segments
    • Issue: Customers using Fiber Optic services show drastically higher churn risk (up to 55 times) than those without internet service.
    • Recommendation: Identify dissatisfaction drivers (e.g., pricing, performance expectations) among fiber customers. Consider bundling support services, better onboarding, or satisfaction guarantees to mitigate churn.
  3. Strengthen Support Service Adoption
    • Issue: Customers without OnlineSecurity, TechSupport, or OnlineBackup are twice as likely to churn.
    • Recommendation: Create campaigns that emphasize the value of bundled services, or offer free trials for new customers to increase feature adoption.
  4. Segment & Support Vulnerable Demographics
    • Issue: Churn is notably higher among senior citizens, those without partners, and customers without dependents.
    • Recommendation: Tailor customer engagement with personalized communication and support plans, potentially through age-sensitive or household-sensitive retention offers.
  5. Incentivize Payment Method Improvements
    • Issue: Customers using Electronic Check face the highest churn (≈45%).
    • Recommendation: Encourage automatic payments (credit card, bank transfer) with incentives or discounts, while simplifying transitions away from risky payment methods.

J. Next Steps

  1. Enhance Hazard Modeling

    • Incorporate Updated Behavioral Data: Continuously retrain the hazard model with the latest customer behavior and interaction data to ensure risk estimates remain current and reflective of evolving patterns.
    • Explore Non-Linear Techniques: Investigate more flexible models (e.g., survival random forests, neural survival models, or gradient-boosted AFT models) to capture complex, non-linear relationships that the Cox model may overlook.
    • Model Segmentation: Build segmented hazard models for distinct customer groups (e.g., senior citizens, fiber customers) to uncover group-specific churn dynamics.
  2. Improve Model Explainability

    • Integrate SHAP or PDP Tools: Leverage SHAP (SHapley Additive exPlanations) or Partial Dependence Plots to provide actionable, feature-level insights into individual churn predictions. This enhances transparency and facilitates stakeholder buy-in.
    • Develop Explainable Reporting Interfaces: Generate human-readable summaries of why a specific customer is flagged as high-risk, making the model outputs interpretable for both technical and non-technical teams.
  3. Deploy Real-Time Churn Monitoring and Intervention Tools

    • Build Interactive Survival Dashboards: Visualize real-time survival curves, churn probabilities, and hazard scores, with segmentation by tenure, contract type, internet service, and other key attributes.
    • Enable Smart Alerts for Proactive Retention: Implement automated alert systems to flag high-risk customers early in their tenure, enabling timely and personalized retention actions.
    • Integrate Risk Scoring into CRM Workflows: Push hazard scores and churn flags directly into CRM systems or customer service dashboards to assist agents in prioritizing outreach and tailoring retention offers.
    • Track Intervention Outcomes: Monitor the effectiveness of retention strategies deployed for high-risk segments, and loop this feedback into model retraining cycles.

About

Leveraged survival analysis techniques (Kaplan-Meier, Cox model) to estimate customer lifetime and pinpoint periods of elevated churn risk.

Topics

Resources

Stars

Watchers

Forks

Contributors