RetailPulse

Retail analytics, recommendation, and dashboarding on Databricks.

RetailPulse is a Databricks-first retail analytics project built around a validated Instacart implementation. It takes grocery order history through a bronze, silver, and gold Delta pipeline, persists OLAP and report tables, surfaces cross-sell recommendations and customer segments, validates a replay-style streaming flow, and publishes a five-page AI/BI dashboard for review.

This repository is now packaged for GitHub in three layers:

README.md: the project story and journey from idea to implementation
RESULTS-README.md: the Databricks study, visuals, and validated results
HOW TO USE.md: the operator guide for rerunning the workflow and adapting it to a new dataset

Read This Like Tabs

If you want the repo to feel like a three-tab GitHub package, use this path:

README.md for the story
RESULTS-README.md for the finished Databricks study and visuals
HOW TO USE.md for the rerun and adaptation workflow

Repository Docs

These root-level documents are meant to appear as first-class repository docs on GitHub:

At A Glance

Implemented now

Deterministic 10% Instacart sampling flow
Databricks Asset Bundle deployment and sequential rebuild job
Bronze, silver, and gold Delta pipeline
Star schema, OLAP report tables, association rules, clustering, streaming validation, and optimize evidence
Dashboard V2 with five reviewer-facing pages
Notebook fallback via notebooks/12_report_pack.py
Supplementary predictive and prescriptive deep-dive notebooks for reviewer walkthroughs

Validated now

Workspace target: Databricks Free Edition serverless
Job id: 61936309152043
Latest successful run id: 631388168060027
Published dashboard: RetailPulse Demo Dashboard
Dashboard id: 01f1305e8f1a115e8fb2b378bd4d8f99
Dashboard revision: 2026-04-05T08:40:02.619Z

Planned next

Self-service dataset upload and mapping flow for non-Instacart retail datasets
Dataset-aware multi-store operation behind a canonical retail contract
Optional external BI layer after the Databricks story is fully stable

Visual Overview

Executive overview	Recommendations and segments

Execution and data quality	Experimental insights

For the full evidence set, use RESULTS-README.md and Docs/evidence-pack.md.

From Idea To Implementation

RetailPulse started as a 2-week Databricks warehouse project idea: take a public retail-like dataset, prove a proper medallion pipeline, build analytics tables that are actually reviewable, and package the work so it survives beyond a notebook demo. The project then evolved through four stages:

Foundation: deterministic sampling, Databricks bundle deployment, bronze-silver-gold pipeline, and star schema.
Analytics: OLAP outputs, pairwise association-rule mining, clustering, replay-style streaming validation, and optimization evidence.
Presentation: screenshot pack, report-pack notebook, published Databricks AI/BI dashboard, and GitHub-facing docs.
Release hardening: release checklist, production-state docs, smoke checks, boss-facing walkthrough, and Dashboard V2 polish.

The result is a repo that is not just “some notebooks,” but a packaged analytics system with a validated live run and a reviewable evidence trail.

Architecture

flowchart LR
    A[Raw Instacart CSVs] --> B[Deterministic 10% local sample]
    B --> C[Upload to Databricks volume]
    C --> D[Bronze Delta tables]
    D --> E[Silver enriched tables]
    E --> F[Gold facts, dimensions, marts]
    F --> G[OLAP and report tables]
    F --> H[Association rules and clustering]
    F --> I[Experimental Insights]
    F --> J[Replay streaming validation]
    G --> K[Dashboard V2]
    H --> K
    I --> K
    J --> K
    G --> L[12_report_pack.py fallback]
    I --> M[13_predictive_analysis.py]
    H --> N[14_prescriptive_analysis.py]

Assets:

Open These First

If you are reviewing the repo on GitHub, use this order:

Dashboard V2 Story

The live Databricks dashboard is organized into five pages:

Executive Overview
Order Behavior
Recommendations And Segments
Execution And Data Quality
Experimental Insights And Performance

That same page order is mirrored in the packaged evidence and the fallback notebook.

For deeper walkthroughs after the five-page story, use notebooks/13_predictive_analysis.py for the exploratory predictive lane, notebooks/14_prescriptive_analysis.py for the action-oriented recommendation and segmentation lane, and Docs/dashboard-output-diagrams.md for the widget-by-widget dashboard explanation.

Current Vs Future

Current repo truth

The repo contains a validated Instacart implementation on Databricks.
The live dashboard, screenshot pack, and report-pack notebook are all real and aligned.
Classifier and regression outputs remain in an Experimental Insights lane.

Future system goal

A self-service upload and mapping system where a retailer can provide order-item CSVs and RetailPulse can normalize them, analyze them, publish a dashboard, and produce actionable output files.

That future system is planned next. It is not already implemented in this repository.

Explore Further

Project story and results

Running and adapting the workflow

Live-state and release docs

Honesty Rules

RetailPulse currently proves a validated Instacart analytics implementation, not a generic upload-any-retail-CSV product.
Dashboard V2 is implemented and validated now.
The generic self-service uploader and mapper are planned next.
The supervised ML outputs are exploratory and are not operational decision drivers in the current release.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
Docs		Docs
assets		assets
notebooks		notebooks
notebooks_ipynb		notebooks_ipynb
resources		resources
scripts		scripts
sql		sql
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile.quickstart		Dockerfile.quickstart
HOW TO USE.md		HOW TO USE.md
README.md		README.md
RESULTS-README.md		RESULTS-README.md
SECURITY.md		SECURITY.md
databricks.yml		databricks.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RetailPulse

Read This Like Tabs

Repository Docs