Created: 2025-11-30 Status: In Progress Target: Production-ready SaaS (8.5+/10 readiness score)
Current State: Production-ready MVP (Score: ~7.5/10) Session Progress: All P0, P1, and P2 (tier enforcement) issues RESOLVED Remaining: P2 (job persistence), P3 (observability)
Problem: Lines 59, 193, 297, 429, 534 - All endpoint functions are def (synchronous), not async def.
# Current (BLOCKING):
@router.get("/gps-reliability")
def get_gps_reliability(...): # Blocks event loop!
weather = realtime_service.get_latest_weather() # Sync I/OImpact: Under load, one slow NOAA fetch blocks ALL concurrent requests.
Problem: Lines 38-40 - Every Celery task creates a new database engine:
# Current (CONNECTION EXHAUSTION):
@celery_app.task
def log_api_usage(...):
engine = create_engine(sync_url) # New engine per task!Impact: 100 concurrent tasks = immediate connection pool exhaustion.
Problem: Lines 73-77 - Usage tracking writes on every API request:
api_key.calls_this_month += 1 # Write on every read
await session.commit() # Blocks responseImpact: Every API request triggers a database write, limiting throughput.
Problem: Lines 23-29 - Untestable global state:
_prediction_service = PredictionService() # Global singletonImpact: Cannot inject mocks for unit testing, shared state risks.
| Priority | Issue | Severity | Effort | Status |
|---|---|---|---|---|
| P0 | Blocking I/O in endpoints | Critical | Medium | DONE |
| P0 | DB engine per Celery task | Critical | Low | DONE |
| P0 | Sync usage tracking | High | Medium | DONE |
| P1 | Global singletons | High | Medium | DONE |
| P1 | No test suite | High | High | DONE |
| P1 | CORS ["*"] |
High | Low | DONE |
| P2 | Tier enforcement | Medium | Medium | DONE |
| P2 | Job service persistence | Medium | Medium | DONE |
| P3 | Observability stack | Medium | High | DONE |
Convert all endpoint functions from def to async def and ensure all I/O-bound service methods are also async.
Files to modify:
app/api/endpoints.py- All 5 endpoint functionsapp/services/realtime_service.py- HTTP calls to NOAAapp/services/ionosphere_service.py- HTTP calls to NOAAapp/services/radiation_service.py- HTTP calls to NOAAapp/services/prediction_service.py- Model loading (can stay sync, CPU-bound)
Create shared connection pool initialized at worker startup:
# app/worker/db.py (NEW FILE)
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from celery.signals import worker_process_init
_engine = None
_SessionLocal = None
@worker_process_init.connect
def init_worker_db(**kwargs):
global _engine, _SessionLocal
sync_url = settings.DATABASE_URL.replace("+asyncpg", "")
_engine = create_engine(sync_url, pool_size=5, max_overflow=10, pool_pre_ping=True)
_SessionLocal = sessionmaker(bind=_engine)
def get_session():
if _SessionLocal is None:
raise RuntimeError("Worker DB not initialized")
return _SessionLocal()Push usage events to Redis list, batch process every 10 seconds:
- Modify
get_valid_api_key()to push to Redis instead of DB write - Create new Celery task
flush_usage_events()for batch processing - Add to Celery Beat schedule (every 10 seconds)
Convert global singletons to proper dependency injection with @lru_cache for stateless services.
tests/
├── conftest.py # Shared fixtures
├── unit/
│ ├── test_prediction_service.py
│ ├── test_geomagnetic_service.py
│ └── test_risk_calculation.py
├── integration/
│ ├── test_auth_flow.py
│ ├── test_api_endpoints.py
│ └── test_rate_limiting.py
└── fixtures/
├── noaa_responses.json
└── sample_api_keys.py
- Fix CORS to whitelist specific origins
- Add security headers middleware
- Implement input validation on all endpoints
- Remove legacy
dev-secret-keyreferences
Implement cache-aside pattern with Redis for GPS reliability responses.
Replace slowapi with Redis-backed token bucket for burst protection.
- Add indices on frequently queried columns
- Configure connection pooling for production load
- Implement query optimization
Recommended stack (avoid K8s initially):
| Service | Provider | Purpose |
|---|---|---|
| API | Render / Railway | FastAPI (3 replicas) |
| Frontend | Vercel | Next.js edge |
| Database | Supabase / Neon | Managed Postgres |
| Cache | Upstash | Managed Redis |
| Worker | Render Background | Celery workers |
- Structured logging (JSON format)
- Prometheus metrics
- Sentry error tracking
- Health check endpoints
| Metric | Target |
|---|---|
| API uptime | 99.9% |
| p50 latency | < 100ms |
| p99 latency | < 500ms |
| Test coverage | >= 80% |
| Security vulnerabilities | 0 critical/high |
- Created production roadmap
- Identified critical blocking I/O issues
- COMPLETED: Convert all API endpoints to async def
app/api/endpoints.py- All 5 endpoints now async- Concurrent I/O with
asyncio.gather()for weather + TEC fetches
- COMPLETED: Make services async
app/services/realtime_service.py- Usinghttpx.AsyncClientapp/services/ionosphere_service.py- Usinghttpx.AsyncClientapp/services/radiation_service.py- Usinghttpx.AsyncClient
- COMPLETED: Fix Celery DB connection pooling
- Created
app/worker/db.pywith shared connection pool - Uses
@worker_process_initsignal for pool initialization - Updated all tasks to use
get_session()context manager
- Created
- COMPLETED: Fix CORS configuration
- Environment-aware origin whitelist (dev vs prod)
- Restricted methods and headers
- COMPLETED: Fix deprecated datetime.utcnow() calls
- Replaced all with
datetime.now(timezone.utc) - Created helper
utc_now()functions where needed
- Replaced all with
- COMPLETED: Decouple usage tracking to Redis
- Usage events pushed to Redis list (non-blocking)
- New
flush_usage_eventsCelery task processes every 10s - Graceful fallback to sync DB update if Redis unavailable
- COMPLETED: Refactor singletons to proper DI
@lru_cachedecorators for stateless servicesget_realtime_service()accepts ionosphere dependency- Enables proper mocking via
dependency_overrides
- COMPLETED: Create test infrastructure
tests/conftest.pywith shared fixturestests/unit/with 37 passing teststests/integration/marked for DB-dependent tests- pytest configuration in pyproject.toml
- COMPLETED: Implement tier-based rate limiting
- Created
app/core/rate_limit.pywith Redis-backed sliding window - Rate limits based on user tier (free: 60/min, business: 3000/min)
- Response headers: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
- Graceful fallback to local rate limiting if Redis unavailable
- Replaced slowapi decorators with dependency injection
- Created
f174f3f- Refactor: Convert to async I/O and fix critical production issues74612c6- Decouple usage tracking to Redis + refactor singletons to DIb1fd551- Add pytest test infrastructure with 37 passing unit tests5cfc7df- Implement tier-based rate limiting with Redis backend
app/worker/db.py- Shared Celery worker DB connection poolapp/core/rate_limit.py- Tier-based rate limiting with Redis backendtests/conftest.py- Shared pytest fixturestests/unit/test_geomagnetic_service.py- 23 geomagnetic calculation teststests/unit/test_risk_calculation.py- 14 risk scoring teststests/integration/test_api_endpoints.py- API endpoint teststests/integration/test_auth_flow.py- Authentication flow tests
app/api/endpoints.py- All endpoints now async, tier-based rate limitingapp/api/deps.py- DI with @lru_cache, Redis usage trackingapp/services/realtime_service.py- Async with httpxapp/services/ionosphere_service.py- Async with httpxapp/services/radiation_service.py- Async with httpxapp/worker/tasks.py- Uses shared connection poolapp/worker/celery_app.py- Added usage flush task scheduleapp/core/redis.py- Added lpush/lrange_and_trim for queuesapp/main.py- Removed slowapi, custom rate limit handlerpyproject.toml- Added pytest config and dev dependencies
- 54 unit tests passing (geomagnetic + risk calculation + job service)
- Integration tests require database (marked for CI/CD)
Job service persistence (use Redis or DB for job state)DONEObservability stack (structured logging, Prometheus, Sentry)DONE- Deploy to production environment
- Structured JSON Logging: Created
app/core/logging.pywith correlation IDs- JSON format in production, human-readable in development
- Request logging middleware with timing
- X-Request-ID header propagation
- Prometheus Metrics: Created
app/core/metrics.py- Request count, latency histograms
- Business metrics (GPS reliability requests)
- Service health gauges (Redis, models)
/metricsendpoint for Prometheus scraping
- Sentry Integration: Configured in
app/main.py- Error tracking with environment context
- Traces and profiles sampling
- Enabled via
SENTRY_DSNenvironment variable
- Enhanced Health Check:
/healthnow checks dependencies- Redis connection status
- ML models loaded status
- Returns "healthy" or "degraded"
- Dependencies Added: prometheus-client, sentry-sdk[fastapi]