Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -74,3 +74,10 @@ package-lock.json

# Session-specific HTML scratchpads
today_progress.html

# Personal design-spec / brief drafts (kept locally, not part of source)
PROJECT_BRIEF.html
plot_design_spec.html
plot_architecture_visual_flow.html.pdf
Plot — Project Brief.pdf
*.pdf
207 changes: 129 additions & 78 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,144 +1,195 @@
# Plot

Group hangout planner for San Francisco. Two or more people pick their preferences (budget, categories, distance, vibe) — Plot merges them and recommends ranked venues + events from live data.

Built as an end-to-end **MLOps prototype**: data scraping → training → serving with model + LLM rerank → feedback loop → automated weekly retrain → monitoring + backup.

---

## Plot demo

https://github.com/user-attachments/assets/5f21561c-dff5-48d1-b652-23974b1b6329

---

Group date and hangout planner for San Francisco. Plot helps two or more people coordinate outings — dinner, events, activities — by merging everyone's preferences (budget, cuisine, distance, availability) and recommending ranked options pulled from live venue and event data.
## Try it live

Built as an MLOps class project. See [INFRASTRUCTURE.md](INFRASTRUCTURE.md) for the full system design, GCP stack, and cost justification.
| Service | URL |
|---|---|
| **Web app** | https://plot-ui-773940296505.us-central1.run.app |
| **API** | https://plot-decision-engine-773940296505.us-central1.run.app |
| **API health** | [`/health`](https://plot-decision-engine-773940296505.us-central1.run.app/health) |
| **Cost dashboard** | `/admin/llm-cost?days=7` |

Both services run on Google Cloud Run (`min=1` instance, no cold starts; rate-limited 100 req/min/IP).

---

## Current state
## Architecture

- System design doc ([INFRASTRUCTURE.md](INFRASTRUCTURE.md))
- **Decision Engine** FastAPI service v0.3.0: `/`, `/health`, `/recommend`, `/feedback` ([decision_engine.py](decision_engine.py))
- **BigQuery** venue and event retrieval layer ([recommendation_bigquery.py](recommendation_bigquery.py))
- **Supabase** user, recommendation-log, and feedback storage ([db.py](db.py))
- **LLM reranker** (`gpt-4o-mini`) takes the v0 top-20 and produces a final top-K with per-venue reasons; falls back to v0 heuristic on any failure or missing key ([llm_rerank.py](llm_rerank.py), [prompts/rerank_v1.txt](prompts/rerank_v1.txt))
- **Data scraping** pipelines for Google Places and Ticketmaster ([Data_scraping /README.md](Data_scraping%20/README.md))
- **Browser demo UI** with a banner showing whether results were LLM-ranked or v0 ([demo/README.md](demo/README.md))
- **CI** with pytest, ruff lint, and ruff format check on every push and PR ([.github/workflows/ci.yml](.github/workflows/ci.yml))
- **Tests**: unit tests for scoring / preference merging / price-level normalization, offline LLM rerank tests with a fake OpenAI client, `/recommend` wiring tests with mocked BigQuery + LLM, plus opt-in BigQuery integration tests
```mermaid
flowchart LR
subgraph Data["Data layer"]
Places[Google Places API]
TM[Ticketmaster API]
BQ[(BigQuery<br/>places_raw)]
Mirror[(BigQuery<br/>plot_supabase_mirror)]
end

subgraph App["Live serving"]
UI[Web UI<br/>Cloud Run]
API[Decision Engine<br/>FastAPI on Cloud Run]
Sup[(Supabase<br/>Postgres)]
OAI[OpenAI<br/>gpt-4o-mini]
end

subgraph ML["MLOps loop"]
Build[build_training_data.py]
Train[train_ranker.py]
MLF[MLflow]
Models[(models/<br/>plot_ranker_*.joblib)]
end

Places --> BQ
TM --> BQ

UI -->|/recommend| API
API -->|fetch venues| BQ
API -->|score with v1 GBT<br/>+ LLM rerank| OAI
API -->|fallback to v0| API
API -->|log| Sup

Sup -->|weekly join| Build
Build --> Train
Train --> MLF
Train --> Models
Models -->|baked into Docker| API

Sup -->|weekly mirror| Mirror
```

What's coming next: prompt versioning (`rerank_v2`), MLflow prompt registry, eval pipeline that replays logged feedback through prompts, Google Calendar FreeBusy integration, Cloud Run deployment, drift monitoring.
The trained ranker (sklearn GradientBoosting) loads at API startup, scores candidates, and the LLM reranks the top-20 with gpt-4o-mini. Both layers fall back gracefully — if the model file is missing, v0 heuristic ranks; if OpenAI 429s, v0 ranks.

---

## What's actually there (every box ticked)

| Layer | Implementation |
|---|---|
| **Data scraping** | Google Places + Ticketmaster → BigQuery, automated via Cloud Run Jobs + Cloud Scheduler. Manual fallback workflows in `.github/workflows/scrape_*.yml` |
| **Storage** | Supabase Postgres for users, groups, recommendation_log, feedback, group_votes. `db.py` is plain SQL — swap `DATABASE_URL` to migrate |
| **API** | FastAPI on Cloud Run. Endpoints: `/recommend`, `/feedback`, `/events`, `/parse`, `/groups/*`, `/admin/llm-cost`, `/health` |
| **Trained ranker** | `sklearn.GradientBoostingClassifier` trained on real `feedback` rows. Serves at request time via `ranker.py`. `model_version` stamped on every recommendation_log row for A/B comparison |
| **LLM rerank** | gpt-4o-mini reranks v0 top-20 with per-venue reasons (≈$0.0005/call). Prompt versioning via `prompt_version` field |
| **LLM intent parser** | `/parse` turns free text ("chill cocktail night") into structured prefs |
| **Retrain pipeline** | Mondays 07:00 UTC. `build_training_data.py` → Supabase join → `train_ranker.py` → MLflow log → GitHub Release with new `.joblib`. Promotion is manual (drop new artifact in `models/` + redeploy) so a bad week doesn't ship to prod |
| **Cost monitoring** | `GET /admin/llm-cost?days=7` returns total, p50/p95 latency, daily series, breakdown by `model_version` |
| **Backup + analytics** | Mondays 08:00 UTC: weekly Supabase → BigQuery mirror so the same SQL workflow can query user data alongside scraped data |
| **CI/CD** | GitHub Actions: ruff lint + 88 tests on every push, blocked on red. `gcloud builds submit` deploys via Cloud Build |
| **Demo hardening** | 100 req/min/IP rate limit (FastAPI Depends, in-memory sliding window), Cloud Run `min=1 / max=50` for no cold starts and surge headroom |

---

## Tech stack

**Backend** Python 3.11, FastAPI, Pydantic v2, psycopg2 · **Frontend** React 18 (loaded via Babel standalone — no build step), DM Sans + Bricolage Grotesque · **ML** scikit-learn, pandas, MLflow · **LLM** OpenAI gpt-4o-mini · **Data** Google BigQuery, Supabase Postgres · **Deploy** Google Cloud Run, Artifact Registry, Cloud Build · **CI** GitHub Actions, ruff, pytest, pre-commit

---

## Repo layout

| Path | Purpose |
|------|---------|
| [decision_engine.py](decision_engine.py) | FastAPI service — group preference merging, venue scoring, LLM rerank wiring, recommendation + feedback endpoints |
| [llm_rerank.py](llm_rerank.py) | OpenAI-backed reranker that turns the v0 top-20 into a final top-K with per-venue reasons |
| [prompts/](prompts/) | Versioned prompt templates loaded by the reranker (`rerank_v1.txt`) |
| [recommendation_bigquery.py](recommendation_bigquery.py) | BigQuery helpers for fetching venues and events |
| [db.py](db.py) | Supabase (Postgres) layer for users, recommendation logs, and feedback |
| [INFRASTRUCTURE.md](INFRASTRUCTURE.md) | System design, GCP stack, cost estimate, ML model strategy |
| [Data_scraping /](Data_scraping%20/) | Google Places + Ticketmaster → BigQuery pipelines |
| [demo/](demo/) | Standalone browser UI that calls `/recommend` |
| [tests/](tests/) | Unit tests, offline LLM + `/recommend` tests, opt-in BigQuery integration tests |
| [FastAPI/](FastAPI/) | Week 1 wine-classifier exercise (legacy, kept for reference) |
| [.github/workflows/ci.yml](.github/workflows/ci.yml) | GitHub Actions — lint + test on every push / PR |
| [pyproject.toml](pyproject.toml) | Ruff + pytest config (incl. `live` marker) |
| [.pre-commit-config.yaml](.pre-commit-config.yaml) | Pre-commit hook running ruff on staged files |
| [cloudbuild.yaml](cloudbuild.yaml) | Google Cloud Build — builds the Decision Engine Docker image |
| [decision_engine.py](decision_engine.py) | FastAPI service — preference merging, scoring, LLM rerank wiring, all endpoints |
| [ranker.py](ranker.py) | Loads trained `.joblib` at startup, scores candidates, falls back to v0 if missing |
| [llm_rerank.py](llm_rerank.py) | gpt-4o-mini reranker with full v0 fallback |
| [llm_intent.py](llm_intent.py) | Free-text → structured prefs for `/parse` |
| [recommendation_bigquery.py](recommendation_bigquery.py) | BigQuery fetchers for venues + events |
| [db.py](db.py) | Supabase layer (raw SQL via psycopg2) |
| [build_training_data.py](build_training_data.py) | Joins recommendation_log ⨝ feedback → CSV |
| [notebooks/train_ranker.py](notebooks/train_ranker.py) | Trains GBT ranker with NDCG@5 vs v0 baseline, logs to MLflow |
| [categories.py](categories.py) | Single source of truth for the 10 canonical categories |
| [prompts/](prompts/) | Versioned LLM prompt templates (`rerank_v1.txt`, `parse_intent_v1.txt`) |
| [UI/](UI/) | React app — chip-based prefs, group lobby, recs, voting, memories |
| [Data_scraping/](Data_scraping%20/) | Google Places + Ticketmaster → BigQuery pipelines |
| [scripts/](scripts/) | Idempotent Cloud Run setup scripts + Supabase→BQ mirror |
| [tests/](tests/) | 88 tests — unit, mocked-integration, opt-in live |
| [.github/workflows/](.github/workflows/) | CI, weekly retrain, weekly Supabase→BQ mirror, scraper fallbacks |
| [INFRASTRUCTURE.md](INFRASTRUCTURE.md) | System design, GCP cost model, deployment philosophy |

---

## Quick start

Five-minute path to a running local service.

```bash
# 1. Setup
git clone git@github.com:saisri27/Plot_MLops.git
cd Plot_MLops
conda create -n plot python=3.11 -y
conda activate plot
conda create -n plot python=3.11 -y && conda activate plot
pip install -r requirements.txt
pre-commit install

# 2. Credentials
cp "Data_scraping /.env.example" .env
# Edit .env: fill in GCP_PROJECT, DATABASE_URL, MLFLOW_TRACKING_URI, OPENAI_API_KEY
gcloud auth application-default login # for BigQuery
# Fill in: GCP_PROJECT, DATABASE_URL, OPENAI_API_KEY, GOOGLE_PLACES_API_KEY, TICKETMASTER_API_KEY
gcloud auth application-default login # for BigQuery reads

# 3. Run the API
uvicorn decision_engine:app --reload --port 8080
curl http://127.0.0.1:8080/health

# 4. (Optional) Run the browser demo UI
python3 -m http.server 5500
# open http://127.0.0.1:5500/demo/demo.html
# 4. Run the UI (separate terminal)
cd UI && python3 -m http.server 5500
# open http://127.0.0.1:5500/Plot.html
```

If `/recommend` returns 503 with a BigQuery error, you skipped `gcloud auth application-default login` — the API can't read from BigQuery without it.
If `/recommend` returns 503 with a BigQuery error, you skipped step 2's ADC login.

### LLM reranker (optional)
If `OPENAI_API_KEY` isn't set, `/recommend` silently falls back to the trained ranker (or v0 heuristic) with no LLM reasons — the demo still works.

Get a key at https://platform.openai.com/api-keys and put it in `.env` as `OPENAI_API_KEY`.
---

- **With key set**: `/recommend` reranks the v0 top-20 with `gpt-4o-mini` and returns LLM-written reasons. Cost is roughly $0.0005 per call. The demo UI shows an "LLM-ranked" banner above the results, including the model and latency.
- **Without key**: the engine logs a one-time warning at startup and silently falls back to v0 heuristic ranking. Demo UI shows the "Heuristic ranking (v0)" banner.
## Testing

The fallback path is also taken on any LLM error (timeout, malformed response, all picks hallucinated), so a flaky API never breaks `/recommend`.
```bash
# Full suite, mocked live deps (matches CI)
pytest tests/ -v -m "not live"

### Response shape
# Lint + format
ruff check . && ruff format --check .

`POST /recommend` returns the top-K recommendations plus rerank metadata:
# Live OpenAI test (needs key)
pytest tests/test_llm_rerank.py -v -m live

```
{
"merged_budget": "...", "merged_max_distance_km": ...,
"merged_categories": [...], "group_size": ..., "venues_scored": ...,
"recommendations": [{"name": "...", "score": ..., "reason": "...", ...}],
"used_llm": true, // false on v0 fallback
"llm_model": "gpt-4o-mini", // null on fallback
"prompt_version": "rerank_v1", // null on fallback
"llm_latency_ms": 812, // null on fallback
"recommendation_log_id": 42 // null when DATABASE_URL is unset
}
# BigQuery integration (needs ADC)
RUN_BQ_INTEGRATION=1 pytest tests/test_bigquery_integration.py -v
```

`recommendation_log_id` is the SERIAL id from the `recommendation_log` table — the `/feedback` retraining loop will use it as a join key to reconstruct the candidate set behind each accepted/rejected pick.
**88 tests total** — unit (43), mocked integration (40), live (5). CI runs the non-live subset on every push.

---

## Testing
## Deployment

```bash
# Unit + offline integration tests, LLM live-test skipped (matches CI behavior)
pytest tests/ -v -m "not live" --ignore=tests/test_bigquery_integration.py
# API (rebuilds Docker image with the latest model in models/)
source .env && bash scripts/setup_cloud_run_api.sh

# Live OpenAI integration test (needs OPENAI_API_KEY)
pytest tests/test_llm_rerank.py -v -m live

# BigQuery integration tests (opt-in, needs ADC)
RUN_BQ_INTEGRATION=1 pytest tests/test_bigquery_integration.py -v
# UI
bash scripts/setup_cloud_run_ui.sh

# Lint + format check
ruff check .
ruff format --check .
# Scrapers (Cloud Run Jobs + Cloud Scheduler)
source .env && bash scripts/setup_cloud_run_jobs.sh
```

Test files in `tests/`:

- `test_decision_engine.py` — scoring, preference merging, price-level normalization
- `test_recommendation_bigquery.py` — BigQuery helpers with a mocked client
- `test_llm_rerank.py` — LLM reranker with a fake OpenAI client (offline) plus one `@pytest.mark.live` smoke test
- `test_decision_engine_with_llm.py` — `/recommend` wiring with both BigQuery and the LLM mocked out, covering both the LLM-success and v0-fallback paths
- `test_bigquery_integration.py` — opt-in queries against the real `mlops-project-491402.places_raw.*` tables (data-quality assertions)

CI runs lint + `pytest -m "not live"` automatically on every push and pull request. Live LLM tests and BigQuery integration tests are intentionally skipped in CI because they require external credentials.
All deploy scripts are idempotent — re-run after any code change to roll out a new revision. The API URL stays stable across revisions.

---

## Architecture
## License

Three-layer: **Frontend** (React, browser) → **API** (FastAPI Decision Engine on Cloud Run) → **Data** (BigQuery venues/events + Supabase users/feedback + MLflow model registry). Full diagram and per-component justification in [INFRASTRUCTURE.md](INFRASTRUCTURE.md).
MIT. See [LICENSE](LICENSE) if present, otherwise this is a class-project prototype shared for educational reference.

---

Built for the MSDS-694 / 698 MLOps course. See [INFRASTRUCTURE.md](INFRASTRUCTURE.md) for the full system-design doc and cost justification.
Loading