Retail analytics, recommendation, and dashboarding on Databricks.
RetailPulse is a Databricks-first retail analytics project built around a validated Instacart implementation. It takes grocery order history through a bronze, silver, and gold Delta pipeline, persists OLAP and report tables, surfaces cross-sell recommendations and customer segments, validates a replay-style streaming flow, and publishes a five-page AI/BI dashboard for review.
This repository is now packaged for GitHub in three layers:
- README.md: the project story and journey from idea to implementation
- RESULTS-README.md: the Databricks study, visuals, and validated results
- HOW TO USE.md: the operator guide for rerunning the workflow and adapting it to a new dataset
If you want the repo to feel like a three-tab GitHub package, use this path:
- README.md for the story
- RESULTS-README.md for the finished Databricks study and visuals
- HOW TO USE.md for the rerun and adaptation workflow
These root-level documents are meant to appear as first-class repository docs on GitHub:
- Deterministic 10% Instacart sampling flow
- Databricks Asset Bundle deployment and sequential rebuild job
- Bronze, silver, and gold Delta pipeline
- Star schema, OLAP report tables, association rules, clustering, streaming validation, and optimize evidence
- Dashboard V2 with five reviewer-facing pages
- Notebook fallback via
notebooks/12_report_pack.py - Supplementary predictive and prescriptive deep-dive notebooks for reviewer walkthroughs
- Workspace target: Databricks Free Edition serverless
- Job id:
61936309152043 - Latest successful run id:
631388168060027 - Published dashboard:
RetailPulse Demo Dashboard - Dashboard id:
01f1305e8f1a115e8fb2b378bd4d8f99 - Dashboard revision:
2026-04-05T08:40:02.619Z
- Self-service dataset upload and mapping flow for non-Instacart retail datasets
- Dataset-aware multi-store operation behind a canonical retail contract
- Optional external BI layer after the Databricks story is fully stable
| Executive overview | Recommendations and segments |
|---|---|
![]() |
![]() |
| Execution and data quality | Experimental insights |
|---|---|
![]() |
![]() |
For the full evidence set, use RESULTS-README.md and Docs/evidence-pack.md.
RetailPulse started as a 2-week Databricks warehouse project idea: take a public retail-like dataset, prove a proper medallion pipeline, build analytics tables that are actually reviewable, and package the work so it survives beyond a notebook demo. The project then evolved through four stages:
Foundation: deterministic sampling, Databricks bundle deployment, bronze-silver-gold pipeline, and star schema.Analytics: OLAP outputs, pairwise association-rule mining, clustering, replay-style streaming validation, and optimization evidence.Presentation: screenshot pack, report-pack notebook, published Databricks AI/BI dashboard, and GitHub-facing docs.Release hardening: release checklist, production-state docs, smoke checks, boss-facing walkthrough, and Dashboard V2 polish.
The result is a repo that is not just “some notebooks,” but a packaged analytics system with a validated live run and a reviewable evidence trail.
flowchart LR
A[Raw Instacart CSVs] --> B[Deterministic 10% local sample]
B --> C[Upload to Databricks volume]
C --> D[Bronze Delta tables]
D --> E[Silver enriched tables]
E --> F[Gold facts, dimensions, marts]
F --> G[OLAP and report tables]
F --> H[Association rules and clustering]
F --> I[Experimental Insights]
F --> J[Replay streaming validation]
G --> K[Dashboard V2]
H --> K
I --> K
J --> K
G --> L[12_report_pack.py fallback]
I --> M[13_predictive_analysis.py]
H --> N[14_prescriptive_analysis.py]
Assets:
If you are reviewing the repo on GitHub, use this order:
- RESULTS-README.md
- HOW TO USE.md
- Docs/showcase-summary.md
- Docs/RetailPulse Handbook.md
- Docs/current-production-state.md
The live Databricks dashboard is organized into five pages:
Executive OverviewOrder BehaviorRecommendations And SegmentsExecution And Data QualityExperimental Insights And Performance
That same page order is mirrored in the packaged evidence and the fallback notebook.
For deeper walkthroughs after the five-page story, use notebooks/13_predictive_analysis.py for the exploratory predictive lane, notebooks/14_prescriptive_analysis.py for the action-oriented recommendation and segmentation lane, and Docs/dashboard-output-diagrams.md for the widget-by-widget dashboard explanation.
- The repo contains a validated Instacart implementation on Databricks.
- The live dashboard, screenshot pack, and report-pack notebook are all real and aligned.
- Classifier and regression outputs remain in an
Experimental Insightslane.
- A self-service upload and mapping system where a retailer can provide order-item CSVs and RetailPulse can normalize them, analyze them, publish a dashboard, and produce actionable output files.
That future system is planned next. It is not already implemented in this repository.
- Docs/current-production-state.md
- Docs/production-runbook.md
- Docs/release-checklist.md
- sql/release_smoke_checks.sql
- RetailPulse currently proves a validated Instacart analytics implementation, not a generic upload-any-retail-CSV product.
- Dashboard V2 is implemented and validated now.
- The generic self-service uploader and mapper are planned next.
- The supervised ML outputs are exploratory and are not operational decision drivers in the current release.




