feat(routerreplay): default store_backend to postgres for durable replay#1683
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
✅ Supply Chain Security Report — All Clear
Scanned at |
0f836f1 to
ce3429b
Compare
Xunzhuo
left a comment
There was a problem hiding this comment.
There are multiple places we need a database when we run vllm-sr serve, can we unify this? And make sure when we start vllm-sr the default environment contains relevant resources
ce3429b to
e5d8af3
Compare
Hi @Xunzhuo just want to make sure I understand correctly — you'd like the vllm-sr serve CLI to detect which storage backends the config requires (Redis, Postgres, etc.) and automatically start them as part of the environment, right? If so, should I address that in this PR or open a follow-up issue for it? |
e5d8af3 to
c4b01e5
Compare
|
@yehuditkerido yes, this PR now will break the vllm-sr serve installation process, since you changed the defaults for replay storage but not adding storage setup process in vllm-sr |
c4b01e5 to
8bcf9e1
Compare
8bcf9e1 to
ea0c268
Compare
… replay Router Replay records (routing decisions, model selections, guardrail results) were lost on every restart because the default was memory. Change the default to postgres — the right tool for structured audit data that needs SQL queryability and long-term retention. Warn operators who explicitly choose memory. Add a dedicated E2E profile with Postgres to validate restart-recovery. Signed-off-by: Yehudit Kerido <ykerido@ykerido-thinkpadp1gen7.raanaii.csb>
ea0c268 to
652382a
Compare

Summary
Router Replay records were lost on every restart because
store_backenddefaulted tomemory. Change the default topostgresfor SQL-queryable audit storage, warn onmemory, and add an E2E restart-recovery test.Changes
RouterReplayConfig.StoreBackendfrom"memory"to"postgres"incanonical_defaults.gologging.Warnfwhen operator selectsmemorybackendRouterReplayConfigand unit test for new defaultstate-taxonomy-and-inventory.mdto reflect new defaultrouter-replay-postgreswith Postgres 16 deployment, restart-recovery test, and CI wiringDoGETRequestfixture helper and register profile inimports.goCLI auto-provisioning of storage backends
storage_backends.py:detect_required_backendsreadsglobal.services.<key>.store_backendfrom the loaded config and returns which backends (redis,postgres) need provisioningdocker_services.py:docker_start_redisanddocker_start_postgresnow call_reuse_running_storage_containerbefore_replace_existing_container— if the storage container is already running it is kept as-is, preserving data across router restartsconfig/vllm-sr-config-cli.yaml: migrated to v0.3 canonical format; setsresponse_api → redis,router_replay → postgres, addsrouter_replayplugin todefault_routedocs/durable-router-replay-guide-he.html: end-to-end manual test guide covering auto-provisioning, replay verification in Postgres, and restart durabilityTesting
postgres, TTL 2592000)router-replay-restart-recoverypasses - record survives pod restartgo vetandgofmtcleanvllm-sr servewith canonical v0.3 config auto-provisions Redis + Postgres, replay record persists afterdocker stop/rmof router containers and re-runningvllm-sr serveRelated Issues
Resolves Router Replay portion of #1608
Follows #1661 (Response API → Redis)