Skip to content

feat(serve): default semantic cache to Milvus in vllm-sr serve#1713

Open
asaadbalum wants to merge 1 commit intovllm-project:mainfrom
asaadbalum:feat/issue-1710-default-milvus-semantic-cache
Open

feat(serve): default semantic cache to Milvus in vllm-sr serve#1713
asaadbalum wants to merge 1 commit intovllm-project:mainfrom
asaadbalum:feat/issue-1710-default-milvus-semantic-cache

Conversation

@asaadbalum
Copy link
Copy Markdown
Collaborator

@asaadbalum asaadbalum commented Apr 4, 2026

Summary

Changes the default semantic cache backend from in-memory to Milvus across all three configuration layers (Go canonical defaults, Python CLI provisioning, Helm chart values), and adds automatic Milvus container lifecycle management to vllm-sr serve.

  • Go defaults (canonical_defaults.go): SemanticCache.BackendType changed from "memory" to "milvus".
  • Python CLI: New CANONICAL_STORE_DEFAULTS dict and inject_local_store_runtime_defaults() function inject Milvus connection, collection schema, and development defaults into the runtime config. New docker_start_milvus() provisions a local Milvus container.
  • Helm chart (values.yaml): Top-level backend_type updated to "milvus". Profile-specific overrides untouched (they already explicitly set their own values).

Motivation

Issue #1710. The Go router's cache factory already supported Milvus (MilvusCacheType in cache_factory.go), but the default was memory — meaning users had to manually configure Milvus for persistent semantic caching. This PR makes Milvus the zero-configuration default for vllm-sr serve, matching the project's direction toward production-ready local development.

Key design decisions

  • Store defaults parallel service defaults: The semantic cache lives at global.stores.semantic_cache, not under global.services. A new CANONICAL_STORE_DEFAULTS dict and _effective_store_backend() function mirror the existing service defaults pattern (CANONICAL_SERVICE_DEFAULTS / effective_service_backend()), keeping the two config namespaces cleanly separated.
  • Three-block injection (connection + collection + development): The Go MilvusConfig struct requires not just connection details but also collection schema (name, vector field dimension, index type) and a development flag (auto_create_collection). Injecting only connection defaults causes a router crash ("collection name should not be empty"). All three blocks are injected as a unit.
  • User overrides are always respected: If backend_type: memory is set explicitly, no Milvus container is provisioned. If enabled: false, the cache is skipped entirely. Existing user Milvus config is preserved (only missing fields are backfilled).
  • Container lifecycle follows existing patterns: docker_start_milvus() uses the same reuse/replace, volume mount, and network attachment logic as Redis and Postgres containers.

Test plan

  • 12 unit tests pass (test_storage_backends.py) — detection, injection, user overrides, disabled cache, bootstrap mode
  • Full Python test suite: 218/218 pass
  • Go pkg/config tests pass
  • Full runtime validation: vllm-sr serve deploys 7-container stack including Milvus v2.3.3, runtime-config.yaml verified with all injection blocks, Milvus healthy via pymilvus SDK
  • Override paths validated live: memory override skips Milvus, enabled: false skips Milvus
  • make agent-lint / black / ruff clean

Changed files (9)

File Change
src/semantic-router/pkg/config/canonical_defaults.go BackendType: "memory""milvus"
deploy/helm/semantic-router/values.yaml Top-level backend_type: "memory""milvus"
src/vllm-sr/cli/consts.py Add DEFAULT_MILVUS_PORT = 19530
src/vllm-sr/cli/runtime_stack.py Add milvus_container_name, milvus_port, milvus_url properties
src/vllm-sr/cli/docker_services.py Add docker_start_milvus() — container lifecycle with data volume
src/vllm-sr/cli/storage_backends.py Wire Milvus into provision_storage_backends()
src/vllm-sr/cli/service_defaults.py Add CANONICAL_STORE_DEFAULTS, inject_local_store_runtime_defaults(), store backend detection
src/vllm-sr/cli/commands/runtime_support.py Call inject_local_store_runtime_defaults() in runtime setup
src/vllm-sr/tests/test_storage_backends.py 6 new tests for Milvus detection, injection, overrides

Closes #1710

@netlify
Copy link
Copy Markdown

netlify bot commented Apr 4, 2026

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 978d73e
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/69d2baad683a0700087f43e6
😎 Deploy Preview https://deploy-preview-1713--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 src/vllm-sr

Owners: @Xunzhuo, @szedan-rh, @yehuditkerido, @henschwartz, @mkoushni, @liavweiss, @noalimoy, @haowu1234
Files changed:

  • src/vllm-sr/cli/commands/runtime_support.py
  • src/vllm-sr/cli/consts.py
  • src/vllm-sr/cli/docker_services.py
  • src/vllm-sr/cli/runtime_stack.py
  • src/vllm-sr/cli/service_defaults.py
  • src/vllm-sr/cli/storage_backends.py
  • src/vllm-sr/tests/test_storage_backends.py

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

✅ Supply Chain Security Report — All Clear

Scanner Status Findings
AST Codebase Scan (Py, Go, JS/TS, Rust) 29 finding(s) — MEDIUM: 23 · LOW: 6
AST PR Diff Scan No issues detected
Regex Fallback Scan No issues detected

Scanned at 2026-04-05T19:40:58.140Z · View full workflow logs

@asaadbalum asaadbalum force-pushed the feat/issue-1710-default-milvus-semantic-cache branch 2 times, most recently from 68f03ad to e11c942 Compare April 5, 2026 19:10
Change the vllm-sr serve CLI to provision Milvus as the default
semantic cache backend instead of in-memory storage.

The default is scoped to the Python CLI only — the Go router binary
and Helm chart retain memory as their default so that deployments
without a Milvus instance remain unaffected.

Key changes:
- Add CANONICAL_STORE_DEFAULTS for semantic_cache with backend_type
  milvus in the Python CLI service defaults
- Extend RuntimeStackLayout with milvus container name and port
- Add port-in-use detection to skip container provisioning when an
  external storage backend is already running on the expected port
- Use host.docker.internal for Milvus connection so the router can
  reach Milvus regardless of which container provides it
- Wire Milvus into detect/provision/inject pipeline alongside the
  existing Redis and Postgres backends
- Add unit tests for backend detection, config injection, port
  checking, and user override preservation

Signed-off-by: asaadbalum <asaad.balum@gmail.com>
@asaadbalum asaadbalum force-pushed the feat/issue-1710-default-milvus-semantic-cache branch from e11c942 to 978d73e Compare April 5, 2026 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: default semantic cache to Milvus in vllm-sr serve

10 participants