feat(serve): default semantic cache to Milvus in vllm-sr serve by asaadbalum · Pull Request #1713 · vllm-project/semantic-router

asaadbalum · 2026-04-04T21:56:31Z

Summary

Changes the default semantic cache backend from in-memory to Milvus across all three configuration layers (Go canonical defaults, Python CLI provisioning, Helm chart values), and adds automatic Milvus container lifecycle management to vllm-sr serve.

Go defaults (canonical_defaults.go): SemanticCache.BackendType changed from "memory" to "milvus".
Python CLI: New CANONICAL_STORE_DEFAULTS dict and inject_local_store_runtime_defaults() function inject Milvus connection, collection schema, and development defaults into the runtime config. New docker_start_milvus() provisions a local Milvus container.
Helm chart (values.yaml): Top-level backend_type updated to "milvus". Profile-specific overrides untouched (they already explicitly set their own values).

Motivation

Issue #1710. The Go router's cache factory already supported Milvus (MilvusCacheType in cache_factory.go), but the default was memory — meaning users had to manually configure Milvus for persistent semantic caching. This PR makes Milvus the zero-configuration default for vllm-sr serve, matching the project's direction toward production-ready local development.

Key design decisions

Store defaults parallel service defaults: The semantic cache lives at global.stores.semantic_cache, not under global.services. A new CANONICAL_STORE_DEFAULTS dict and _effective_store_backend() function mirror the existing service defaults pattern (CANONICAL_SERVICE_DEFAULTS / effective_service_backend()), keeping the two config namespaces cleanly separated.
Three-block injection (connection + collection + development): The Go MilvusConfig struct requires not just connection details but also collection schema (name, vector field dimension, index type) and a development flag (auto_create_collection). Injecting only connection defaults causes a router crash ("collection name should not be empty"). All three blocks are injected as a unit.
User overrides are always respected: If backend_type: memory is set explicitly, no Milvus container is provisioned. If enabled: false, the cache is skipped entirely. Existing user Milvus config is preserved (only missing fields are backfilled).
Container lifecycle follows existing patterns: docker_start_milvus() uses the same reuse/replace, volume mount, and network attachment logic as Redis and Postgres containers.

Test plan

12 unit tests pass (test_storage_backends.py) — detection, injection, user overrides, disabled cache, bootstrap mode
Full Python test suite: 218/218 pass
Go pkg/config tests pass
Full runtime validation: vllm-sr serve deploys 7-container stack including Milvus v2.3.3, runtime-config.yaml verified with all injection blocks, Milvus healthy via pymilvus SDK
Override paths validated live: memory override skips Milvus, enabled: false skips Milvus
make agent-lint / black / ruff clean

Changed files (9)

File	Change
`src/semantic-router/pkg/config/canonical_defaults.go`	`BackendType`: `"memory"` → `"milvus"`
`deploy/helm/semantic-router/values.yaml`	Top-level `backend_type`: `"memory"` → `"milvus"`
`src/vllm-sr/cli/consts.py`	Add `DEFAULT_MILVUS_PORT = 19530`
`src/vllm-sr/cli/runtime_stack.py`	Add `milvus_container_name`, `milvus_port`, `milvus_url` properties
`src/vllm-sr/cli/docker_services.py`	Add `docker_start_milvus()` — container lifecycle with data volume
`src/vllm-sr/cli/storage_backends.py`	Wire Milvus into `provision_storage_backends()`
`src/vllm-sr/cli/service_defaults.py`	Add `CANONICAL_STORE_DEFAULTS`, `inject_local_store_runtime_defaults()`, store backend detection
`src/vllm-sr/cli/commands/runtime_support.py`	Call `inject_local_store_runtime_defaults()` in runtime setup
`src/vllm-sr/tests/test_storage_backends.py`	6 new tests for Milvus detection, injection, overrides

Closes #1710

netlify · 2026-04-04T21:56:37Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`978d73e`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/69d2baad683a0700087f43e6
😎 Deploy Preview	https://deploy-preview-1713--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2026-04-04T21:56:48Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `src/vllm-sr`

Owners: @Xunzhuo, @szedan-rh, @yehuditkerido, @henschwartz, @mkoushni, @liavweiss, @noalimoy, @haowu1234
Files changed:

src/vllm-sr/cli/commands/runtime_support.py
src/vllm-sr/cli/consts.py
src/vllm-sr/cli/docker_services.py
src/vllm-sr/cli/runtime_stack.py
src/vllm-sr/cli/service_defaults.py
src/vllm-sr/cli/storage_backends.py
src/vllm-sr/tests/test_storage_backends.py

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

github-actions · 2026-04-04T21:57:02Z

✅ Supply Chain Security Report — All Clear

Scanner	Status	Findings
AST Codebase Scan (Py, Go, JS/TS, Rust)	✅	29 finding(s) — MEDIUM: 23 · LOW: 6
AST PR Diff Scan	✅	No issues detected
Regex Fallback Scan	✅	No issues detected

Scanned at 2026-04-05T19:40:58.140Z · View full workflow logs

Change the vllm-sr serve CLI to provision Milvus as the default semantic cache backend instead of in-memory storage. The default is scoped to the Python CLI only — the Go router binary and Helm chart retain memory as their default so that deployments without a Milvus instance remain unaffected. Key changes: - Add CANONICAL_STORE_DEFAULTS for semantic_cache with backend_type milvus in the Python CLI service defaults - Extend RuntimeStackLayout with milvus container name and port - Add port-in-use detection to skip container provisioning when an external storage backend is already running on the expected port - Use host.docker.internal for Milvus connection so the router can reach Milvus regardless of which container provides it - Wire Milvus into detect/provision/inject pipeline alongside the existing Redis and Postgres backends - Add unit tests for backend detection, config injection, port checking, and user override preservation Signed-off-by: asaadbalum <asaad.balum@gmail.com>

asaadbalum requested review from Xunzhuo and rootfs as code owners April 4, 2026 21:56

github-actions bot assigned abdallahsamabd, asaadbalum, haowu1234, liavweiss, noalimoy, rootfs, samzong, szedan-rh, Xunzhuo and yehuditkerido Apr 4, 2026

asaadbalum force-pushed the feat/issue-1710-default-milvus-semantic-cache branch 2 times, most recently from 68f03ad to e11c942 Compare April 5, 2026 19:10

asaadbalum force-pushed the feat/issue-1710-default-milvus-semantic-cache branch from e11c942 to 978d73e Compare April 5, 2026 19:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(serve): default semantic cache to Milvus in vllm-sr serve#1713

feat(serve): default semantic cache to Milvus in vllm-sr serve#1713
asaadbalum wants to merge 1 commit intovllm-project:mainfrom
asaadbalum:feat/issue-1710-default-milvus-semantic-cache

asaadbalum commented Apr 4, 2026 •

edited

Loading

Uh oh!

netlify bot commented Apr 4, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 4, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Conversation

asaadbalum commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Key design decisions

Test plan

Changed files (9)

Uh oh!

netlify bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 src/vllm-sr

🎉 Thanks for your contributions!

Uh oh!

github-actions bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Supply Chain Security Report — All Clear

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

asaadbalum commented Apr 4, 2026 •

edited

Loading

netlify bot commented Apr 4, 2026 •

edited

Loading

github-actions bot commented Apr 4, 2026 •

edited

Loading

📁 `src/vllm-sr`

github-actions bot commented Apr 4, 2026 •

edited

Loading