feat(serve): default semantic cache to Milvus in vllm-sr serve#1713
Open
asaadbalum wants to merge 1 commit intovllm-project:mainfrom
Open
feat(serve): default semantic cache to Milvus in vllm-sr serve#1713asaadbalum wants to merge 1 commit intovllm-project:mainfrom
asaadbalum wants to merge 1 commit intovllm-project:mainfrom
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Contributor
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
Contributor
✅ Supply Chain Security Report — All Clear
Scanned at |
68f03ad to
e11c942
Compare
Change the vllm-sr serve CLI to provision Milvus as the default semantic cache backend instead of in-memory storage. The default is scoped to the Python CLI only — the Go router binary and Helm chart retain memory as their default so that deployments without a Milvus instance remain unaffected. Key changes: - Add CANONICAL_STORE_DEFAULTS for semantic_cache with backend_type milvus in the Python CLI service defaults - Extend RuntimeStackLayout with milvus container name and port - Add port-in-use detection to skip container provisioning when an external storage backend is already running on the expected port - Use host.docker.internal for Milvus connection so the router can reach Milvus regardless of which container provides it - Wire Milvus into detect/provision/inject pipeline alongside the existing Redis and Postgres backends - Add unit tests for backend detection, config injection, port checking, and user override preservation Signed-off-by: asaadbalum <asaad.balum@gmail.com>
e11c942 to
978d73e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Changes the default semantic cache backend from in-memory to Milvus across all three configuration layers (Go canonical defaults, Python CLI provisioning, Helm chart values), and adds automatic Milvus container lifecycle management to
vllm-sr serve.canonical_defaults.go):SemanticCache.BackendTypechanged from"memory"to"milvus".CANONICAL_STORE_DEFAULTSdict andinject_local_store_runtime_defaults()function inject Milvus connection, collection schema, and development defaults into the runtime config. Newdocker_start_milvus()provisions a local Milvus container.values.yaml): Top-levelbackend_typeupdated to"milvus". Profile-specific overrides untouched (they already explicitly set their own values).Motivation
Issue #1710. The Go router's cache factory already supported Milvus (
MilvusCacheTypeincache_factory.go), but the default wasmemory— meaning users had to manually configure Milvus for persistent semantic caching. This PR makes Milvus the zero-configuration default forvllm-sr serve, matching the project's direction toward production-ready local development.Key design decisions
global.stores.semantic_cache, not underglobal.services. A newCANONICAL_STORE_DEFAULTSdict and_effective_store_backend()function mirror the existing service defaults pattern (CANONICAL_SERVICE_DEFAULTS/effective_service_backend()), keeping the two config namespaces cleanly separated.MilvusConfigstruct requires not just connection details but also collection schema (name, vector field dimension, index type) and a development flag (auto_create_collection). Injecting only connection defaults causes a router crash ("collection name should not be empty"). All three blocks are injected as a unit.backend_type: memoryis set explicitly, no Milvus container is provisioned. Ifenabled: false, the cache is skipped entirely. Existing user Milvus config is preserved (only missing fields are backfilled).docker_start_milvus()uses the same reuse/replace, volume mount, and network attachment logic as Redis and Postgres containers.Test plan
test_storage_backends.py) — detection, injection, user overrides, disabled cache, bootstrap modepkg/configtests passvllm-sr servedeploys 7-container stack including Milvus v2.3.3, runtime-config.yaml verified with all injection blocks, Milvus healthy via pymilvus SDKmemoryoverride skips Milvus,enabled: falseskips Milvusmake agent-lint/black/ruffcleanChanged files (9)
src/semantic-router/pkg/config/canonical_defaults.goBackendType:"memory"→"milvus"deploy/helm/semantic-router/values.yamlbackend_type:"memory"→"milvus"src/vllm-sr/cli/consts.pyDEFAULT_MILVUS_PORT = 19530src/vllm-sr/cli/runtime_stack.pymilvus_container_name,milvus_port,milvus_urlpropertiessrc/vllm-sr/cli/docker_services.pydocker_start_milvus()— container lifecycle with data volumesrc/vllm-sr/cli/storage_backends.pyprovision_storage_backends()src/vllm-sr/cli/service_defaults.pyCANONICAL_STORE_DEFAULTS,inject_local_store_runtime_defaults(), store backend detectionsrc/vllm-sr/cli/commands/runtime_support.pyinject_local_store_runtime_defaults()in runtime setupsrc/vllm-sr/tests/test_storage_backends.pyCloses #1710