feat(routerreplay): default store_backend to postgres for durable replay by yehuditkerido · Pull Request #1683 · vllm-project/semantic-router

yehuditkerido · 2026-03-29T12:12:04Z

Summary

Router Replay records were lost on every restart because store_backend defaulted to memory. Change the default to postgres for SQL-queryable audit storage, warn on memory, and add an E2E restart-recovery test.

Changes

Default RouterReplayConfig.StoreBackend from "memory" to "postgres" in canonical_defaults.go
Emit logging.Warnf when operator selects memory backend
Add Go doc to RouterReplayConfig and unit test for new default
Update website docs with Postgres config example and backend comparison table
Update state-taxonomy-and-inventory.md to reflect new default
New E2E profile router-replay-postgres with Postgres 16 deployment, restart-recovery test, and CI wiring
Add DoGETRequest fixture helper and register profile in imports.go

CLI auto-provisioning of storage backends

storage_backends.py: detect_required_backends reads global.services.<key>.store_backend from the loaded config and returns which backends (redis, postgres) need provisioning
docker_services.py: docker_start_redis and docker_start_postgres now call _reuse_running_storage_container before _replace_existing_container — if the storage container is already running it is kept as-is, preserving data across router restarts
config/vllm-sr-config-cli.yaml: migrated to v0.3 canonical format; sets response_api → redis, router_replay → postgres, adds router_replay plugin to default_route
docs/durable-router-replay-guide-he.html: end-to-end manual test guide covering auto-provisioning, replay verification in Postgres, and restart durability

Testing

Unit: default backend assertion passes (postgres, TTL 2592000)
E2E: router-replay-restart-recovery passes - record survives pod restart
go vet and gofmt clean
Manual (CLI): vllm-sr serve with canonical v0.3 config auto-provisions Redis + Postgres, replay record persists after docker stop/rm of router containers and re-running vllm-sr serve

Note: router-replay-postgres is temporarily added to the CI baseline profiles
so the new E2E test runs on this PR. Happy to remove it in a follow-up commit once
reviewers see it pass — just let me know.

Related Issues

Resolves Router Replay portion of #1608
Follows #1661 (Response API → Redis)

netlify · 2026-03-29T12:12:10Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`652382a`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/69cba8b16706b50008528409
😎 Deploy Preview	https://deploy-preview-1683--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2026-03-29T12:12:21Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `Root Directory`

Owners: @rootfs, @Xunzhuo
Files changed:

.github/workflows/ci-changes.yml
.github/workflows/integration-test-k8s.yml
docs/agent/state-taxonomy-and-inventory.md

📁 `deploy`

Owners: @rootfs, @Xunzhuo
Files changed:

deploy/kubernetes/router-replay/postgres.yaml

📁 `e2e`

Owners: @Xunzhuo
Files changed:

e2e/README.md
e2e/pkg/fixtures/http.go
e2e/profiles/all/imports.go
e2e/profiles/router-replay-postgres/profile.go
e2e/profiles/router-replay-postgres/values.yaml
e2e/testcases/router_replay_restart_recovery.go

📁 `src`

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

src/semantic-router/pkg/config/canonical_defaults.go
src/semantic-router/pkg/config/canonical_loader_test.go
src/semantic-router/pkg/config/runtime_config.go
src/semantic-router/pkg/extproc/router_replay_setup.go
src/vllm-sr/cli/core.py
src/vllm-sr/cli/docker_cli.py
src/vllm-sr/cli/docker_services.py
src/vllm-sr/cli/runtime_lifecycle.py
src/vllm-sr/cli/runtime_stack.py
src/vllm-sr/cli/storage_backends.py

📁 `tools`

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

tools/agent/e2e-profile-map.yaml

📁 `website`

Owners: @Xunzhuo, @rootfs, @yuluo-yx
Files changed:

website/docs/tutorials/global/api-and-observability.md

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

github-actions · 2026-03-29T12:12:33Z

✅ Supply Chain Security Report — All Clear

Scanner	Status	Findings
AST Codebase Scan (Py, Go, JS/TS, Rust)	✅	29 finding(s) — MEDIUM: 22 · LOW: 7
AST PR Diff Scan	✅	No issues detected
Regex Fallback Scan	✅	No issues detected

Scanned at 2026-03-31T10:59:47.532Z · View full workflow logs

Xunzhuo

There are multiple places we need a database when we run vllm-sr serve, can we unify this? And make sure when we start vllm-sr the default environment contains relevant resources

yehuditkerido · 2026-03-30T07:21:38Z

There are multiple places we need a database when we run vllm-sr serve, can we unify this? And make sure when we start vllm-sr the default environment contains relevant resources

Hi @Xunzhuo just want to make sure I understand correctly — you'd like the vllm-sr serve CLI to detect which storage backends the config requires (Redis, Postgres, etc.) and automatically start them as part of the environment, right?

If so, should I address that in this PR or open a follow-up issue for it?

Xunzhuo · 2026-03-30T12:26:28Z

@yehuditkerido yes, this PR now will break the vllm-sr serve installation process, since you changed the defaults for replay storage but not adding storage setup process in vllm-sr

… replay Router Replay records (routing decisions, model selections, guardrail results) were lost on every restart because the default was memory. Change the default to postgres — the right tool for structured audit data that needs SQL queryability and long-term retention. Warn operators who explicitly choose memory. Add a dedicated E2E profile with Postgres to validate restart-recovery. Signed-off-by: Yehudit Kerido <ykerido@ykerido-thinkpadp1gen7.raanaii.csb>

yehuditkerido requested review from Xunzhuo and rootfs as code owners March 29, 2026 12:12

github-actions bot assigned rootfs, wangchen615, Xunzhuo and yuluo-yx Mar 29, 2026

yehuditkerido force-pushed the durable_router_replay branch from 0f836f1 to ce3429b Compare March 29, 2026 12:22

Xunzhuo reviewed Mar 29, 2026

View reviewed changes

yehuditkerido marked this pull request as draft March 29, 2026 12:49

yehuditkerido force-pushed the durable_router_replay branch from ce3429b to e5d8af3 Compare March 30, 2026 07:08

yehuditkerido force-pushed the durable_router_replay branch from e5d8af3 to c4b01e5 Compare March 30, 2026 08:12

yehuditkerido marked this pull request as ready for review March 30, 2026 08:28

yehuditkerido marked this pull request as draft March 30, 2026 13:51

yehuditkerido force-pushed the durable_router_replay branch from c4b01e5 to 8bcf9e1 Compare March 31, 2026 10:18

yehuditkerido marked this pull request as ready for review March 31, 2026 10:20

yehuditkerido force-pushed the durable_router_replay branch from 8bcf9e1 to ea0c268 Compare March 31, 2026 10:29

yehuditkerido force-pushed the durable_router_replay branch from ea0c268 to 652382a Compare March 31, 2026 10:57

rootfs approved these changes Mar 31, 2026

View reviewed changes

rootfs merged commit 6e91324 into vllm-project:main Mar 31, 2026
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(routerreplay): default store_backend to postgres for durable replay#1683

feat(routerreplay): default store_backend to postgres for durable replay#1683
rootfs merged 1 commit intovllm-project:mainfrom
yehuditkerido:durable_router_replay

yehuditkerido commented Mar 29, 2026 •

edited

Loading

Uh oh!

netlify bot commented Mar 29, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 29, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 29, 2026 •

edited

Loading

Uh oh!

Xunzhuo left a comment

Uh oh!

yehuditkerido commented Mar 30, 2026

Uh oh!

Xunzhuo commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

yehuditkerido commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

CLI auto-provisioning of storage backends

Testing

Related Issues

Uh oh!

netlify bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 Root Directory

📁 deploy

📁 e2e

📁 src

📁 tools

📁 website

🎉 Thanks for your contributions!

Uh oh!

github-actions bot commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Supply Chain Security Report — All Clear

Uh oh!

Xunzhuo left a comment

Choose a reason for hiding this comment

Uh oh!

yehuditkerido commented Mar 30, 2026

Uh oh!

Xunzhuo commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yehuditkerido commented Mar 29, 2026 •

edited

Loading

netlify bot commented Mar 29, 2026 •

edited

Loading

github-actions bot commented Mar 29, 2026 •

edited

Loading

📁 `Root Directory`

📁 `deploy`

📁 `e2e`

📁 `src`

📁 `tools`

📁 `website`

github-actions bot commented Mar 29, 2026 •

edited

Loading