An end-to-end Customer Lifetime Value (CLV) and RFM segmentation pipeline built on synthetic fintech data — with BG-NBD & Gamma-Gamma probabilistic models, AI-powered customer explanations, and interactive visualizations.
This project simulates a real-world fintech analytics workflow to predict Customer Lifetime Value, segment customers using RFM scoring, and generate AI-driven insights per customer profile. It mirrors the kind of work done at scale in financial services for retention, targeting, and revenue forecasting.
Built with: Python · Pandas · Lifetimes · Scikit-learn · Plotly · LangChain · Groq LLM · Google Colab
| Feature | Description |
|---|---|
| 📊 Synthetic Dataset | 7,000 customers with 22 features including spend, transactions, loyalty, satisfaction |
| 🔁 RFM Segmentation | Customers scored and labeled: Champions, Loyal, At Risk, Hibernating, Lost, etc. |
| 📈 BG-NBD Model | Predicts expected repeat purchases in the next 90 days |
| 💰 Gamma-Gamma Model | Predicts expected average transaction value per customer |
| 🔮 CLV Forecasting | 90-day, 180-day, and 365-day CLV computed per customer |
| 🤖 AI Explanations | LLM (Groq/LLaMA 3) generates plain-English insights for each customer |
| 📉 Interactive Charts | Plotly dashboards for spend distribution, segment breakdown, CLV by channel |
| 🐙 GitHub Ready | Clean export pipeline with versioning setup |
clv-merchant-segmentation/
│
├── CLV_Fintech_Dashboard.ipynb # Main notebook (all sections)
├── README.md
└── outputs/ # Generated charts and exports (optional)
A synthetic dataset of 7,000 fintech customers is generated with realistic distributions across:
- Demographics: Age, Location, Income Level
- Behavioural: Total Transactions, Active Days, App Usage Frequency
- Financial: Total Spent, Avg/Max/Min Transaction Value, Cashback, LTV
- Categorical: Acquisition Channel, Product Type, Preferred Payment Method
- Outlier capping at 1st–99th percentile
- Null removal on critical columns
- Duplicate customer ID removal
- Feature engineering:
recency_days,tenure_days,avg_spend_per_txn
Each customer is scored 1–5 on Recency, Frequency, and Monetary dimensions using quintile bucketing. Segments assigned:
| Segment | Criteria |
|---|---|
| Champions | R≥4, F≥4, M≥4 |
| Loyal | R≥3, F≥3 |
| Potential Loyal | R≥3, F≤2 |
| New | R=5, F=1 |
| At Risk | R≤2, F≥3 |
| Hibernating | R≤2, F≤2, M≥3 |
| Lost | Everything else |
- BG-NBD (Beta-Geometric/NBD): Models the buy-till-you-die purchase process to predict future transaction frequency
- Gamma-Gamma: Models monetary value variation to predict expected spend per transaction
- Combined to produce discounted CLV at 90 / 180 / 365-day horizons (10% annual discount rate)
Using LangChain + Groq (LLaMA 3.3-70B), each customer row is passed to an LLM that generates a plain-English retention recommendation tailored to their segment, CLV, and behavioural profile.
- EDA — Spend distribution, transaction histograms, correlation heatmap
- RFM Analysis — Segment distribution, RFM score scatter, treemap
- CLV Predictions — CLV by segment, acquisition channel, product type
- BG-NBD Validation — Frequency-recency matrix, probability alive chart
- AI Insights — Per-customer LLM explanations for Champions and At Risk segments
pip install lifetimes==0.11.3 plotly==5.20.0 langchain==0.2.5 \
langchain-groq==0.1.6 scikit-learn==1.4.2 \
ipywidgets nbformat kaleido| Service | Purpose | Get it |
|---|---|---|
| Groq API | LLM customer explanations | console.groq.com |
- 7,000 customers processed end-to-end
- RFM segments distributed across 7 categories
- BG-NBD model predicts 90-day purchase probability per customer
- CLV range: ₹0 (churned) to ₹1.2M+ (high-value Champions)
- AI explanations generated for Champion and At Risk cohorts
This project maps directly to real fintech analytics use cases:
- Customer Retention: Identify At Risk and Hibernating segments before churn
- Revenue Forecasting: 90/180/365-day CLV for financial planning
- Campaign Targeting: Personalise offers by segment × acquisition channel
- Portfolio Analysis: Product-level CLV breakdown for cross-sell strategy
Gaurav Yadav
- GitHub: @Gauravscriptx
This project is open-source and available under the MIT License.