Skip to content

perf: add caching layer for Stripe subscription/price API calls #6790

@beastoin

Description

@beastoin

Problem

The subscription endpoints make multiple synchronous Stripe API calls per request with insufficient caching, causing HTTP 408 timeouts in production. Observed repeated 408s on GET /v1/users/me/subscription in prod logs (2026-04-17 21:46–00:20 UTC).

Each request to the subscription endpoint can make up to 7 Stripe API calls:

  1. Stripe.Subscription.retrieve() — get current price ID
  2. 3 plans × 2 intervals = 6 Stripe.Price.retrieve() calls — build available plans

The available-plans endpoint (GET /v1/payments/available-plans) is worse — zero caching, plus an additional Stripe.Subscription.retrieve() and subscription schedule lookups.

Current Caching State

Data Endpoint Cached? Details
Stripe Price (by price_id) /v1/users/me/subscription ✅ Redis, 24h TTL stripe_price:{price_id} key
Stripe Price (by price_id) /v1/payments/available-plans ❌ None Same data, but no cache
Stripe Subscription (by sub_id) Both endpoints ❌ None Called every request for current_price_id
User subscription (Firestore) Both endpoints ❌ None Single doc read, fast
App review config /v1/users/me/subscription ✅ Memory, 60s TTL should_hide_subscription_ui

Architecture

Layer 1: Stripe Price Cache (extend existing)

  • What: Stripe.Price.retrieve(price_id) results
  • Where: Redis, key stripe_price:{price_id}
  • TTL: 24 hours (prices change rarely — only on plan restructure)
  • Scope: Already implemented in users.py subscription endpoint. Extend to payment.py available-plans endpoint using the same get_generic_cache/set_generic_cache pattern.
  • Invalidation: Manual — flush keys when price IDs change (deploy-time). Acceptable since price changes are rare and planned.
  • Files: backend/routers/payment.py lines 256-293

Layer 2: Stripe Subscription Cache (new)

  • What: Stripe.Subscription.retrieve(subscription_id) results (status, current price, schedule)
  • Where: Redis, key stripe_sub:{subscription_id}
  • TTL: 5–10 minutes (short — changes on upgrade/cancel/renewal)
  • Scope: Both users.py (line 807) and payment.py (line 194)
  • Invalidation:
    • TTL-based (5-10 min) covers most cases
    • Explicit invalidation on write paths: upgrade_subscription_endpoint, cancel_subscription, Stripe webhook handler (stripe_webhook)
    • On invalidation, delete stripe_sub:{subscription_id} so next read fetches fresh
  • Files: backend/routers/users.py line 807, backend/routers/payment.py lines 193-228

Layer 3: Available Plans Catalog Cache (new, optional)

  • What: The fully-assembled available_plans list (plan definitions + resolved prices)
  • Where: Redis, key available_plans_catalog:{version_gate} (keyed by new_plans_enabled bool)
  • TTL: 1 hour
  • Scope: Both endpoints build the same plan catalog. Could be computed once and shared.
  • Invalidation: Flush on price ID env var changes (deploy-time) or Stripe price updates.
  • Trade-off: Higher complexity. Layer 1+2 may be sufficient — measure before implementing.

Call Flow (before vs after)

BEFORE (up to 7 Stripe calls per request):
  Client → /subscription
    → Firestore: get_user_subscription(uid)
    → Stripe: Subscription.retrieve(sub_id)        ← SLOW, no cache
    → Stripe: Price.retrieve(neo_monthly)           ← cached (24h)
    → Stripe: Price.retrieve(neo_annual)            ← cached (24h)
    → Stripe: Price.retrieve(operator_monthly)      ← cached (24h)
    → Stripe: Price.retrieve(operator_annual)       ← cached (24h)
    → Stripe: Price.retrieve(architect_monthly)     ← cached (24h)
    → Stripe: Price.retrieve(architect_annual)      ← cached (24h)

AFTER (0-1 Stripe calls when warm):
  Client → /subscription
    → Firestore: get_user_subscription(uid)
    → Redis: stripe_sub:{sub_id}                    ← HIT (5-10min TTL)
    → Redis: stripe_price:{neo_monthly}             ← HIT (24h TTL)
    → Redis: stripe_price:{neo_annual}              ← HIT
    → Redis: stripe_price:{operator_monthly}        ← HIT
    → Redis: stripe_price:{operator_annual}         ← HIT
    → Redis: stripe_price:{architect_monthly}       ← HIT
    → Redis: stripe_price:{architect_annual}        ← HIT

Implementation Notes

  • Use existing get_generic_cache/set_generic_cache from database.redis_db — no new infra needed
  • Redis is fail-open in this codebase (errors caught + logged, requests proceed with fresh Stripe call)
  • Stripe Subscription cache value should store the full .to_dict() result (same pattern as price cache)
  • Invalidation helpers: add invalidate_stripe_sub_cache(subscription_id) called from upgrade/cancel/webhook paths
  • The reconcile_basic_plan_with_stripe function (line 796) also calls Stripe.Subscription.retrieve — can use the same cache

Priority

High — this directly causes 408 timeouts on the subscription endpoint in production, which makes the mobile app's subscription management card disappear silently (subscription data fails to load).

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions