Skip to content

Commit ce3429b

Browse files
author
Yehudit Kerido
committed
feat(routerreplay): default store_backend to postgres for audit-grade replay
Router Replay records (routing decisions, model selections, guardrail results) were lost on every restart because the default was memory. Change the default to postgres — the right tool for structured audit data that needs SQL queryability and long-term retention. Warn operators who explicitly choose memory. Add a dedicated E2E profile with Postgres to validate restart-recovery. Signed-off-by: Yehudit Kerido <ykerido@ykerido-thinkpadp1gen7.raanaii.csb>
1 parent 761dac1 commit ce3429b

File tree

15 files changed

+564
-4
lines changed

15 files changed

+564
-4
lines changed

.github/workflows/ci-changes.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ on:
4747
value: ${{ jobs.filter.outputs.e2e_response_api_redis }}
4848
e2e_response_api_redis_cluster:
4949
value: ${{ jobs.filter.outputs.e2e_response_api_redis_cluster }}
50+
e2e_router_replay_postgres:
51+
value: ${{ jobs.filter.outputs.e2e_router_replay_postgres }}
5052
e2e_ml_model_selection:
5153
value: ${{ jobs.filter.outputs.e2e_ml_model_selection }}
5254
e2e_multi_endpoint:
@@ -85,6 +87,7 @@ jobs:
8587
e2e_response_api: ${{ steps.changes.outputs.e2e_response_api }}
8688
e2e_response_api_redis: ${{ steps.changes.outputs.e2e_response_api_redis }}
8789
e2e_response_api_redis_cluster: ${{ steps.changes.outputs.e2e_response_api_redis_cluster }}
90+
e2e_router_replay_postgres: ${{ steps.changes.outputs.e2e_router_replay_postgres }}
8891
e2e_ml_model_selection: ${{ steps.changes.outputs.e2e_ml_model_selection }}
8992
e2e_multi_endpoint: ${{ steps.changes.outputs.e2e_multi_endpoint }}
9093
e2e_authz_rbac: ${{ steps.changes.outputs.e2e_authz_rbac }}
@@ -206,6 +209,11 @@ jobs:
206209
e2e_response_api_redis_cluster:
207210
- 'e2e/profiles/response-api-redis-cluster/**'
208211
- 'deploy/kubernetes/response-api/redis-cluster.yaml'
212+
e2e_router_replay_postgres:
213+
- 'e2e/profiles/router-replay-postgres/**'
214+
- 'deploy/kubernetes/router-replay/**'
215+
- 'src/semantic-router/pkg/routerreplay/**'
216+
- 'src/semantic-router/pkg/extproc/router_replay_setup.go'
209217
e2e_ml_model_selection:
210218
- 'e2e/profiles/ml-model-selection/**'
211219
- 'src/semantic-router/pkg/modelselection/**'

.github/workflows/integration-test-k8s.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ jobs:
4242
[[ "${{ needs.changes.outputs.agent_exec }}" == "true" ]] || \
4343
[[ "${{ github.event_name }}" == "schedule" ]] || \
4444
[[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
45-
echo 'profiles=["kubernetes", "dashboard"]' >> $GITHUB_OUTPUT
45+
echo 'profiles=["kubernetes", "dashboard", "router-replay-postgres"]' >> $GITHUB_OUTPUT
4646
echo 'should_run=true' >> $GITHUB_OUTPUT
4747
echo "Running default baseline profiles due to common/core changes or push/schedule/manual trigger"
4848
exit 0
@@ -62,6 +62,7 @@ jobs:
6262
[[ "${{ needs.changes.outputs.e2e_multi_endpoint }}" == "true" ]] && profiles+=("multi-endpoint")
6363
[[ "${{ needs.changes.outputs.e2e_authz_rbac }}" == "true" ]] && profiles+=("authz-rbac")
6464
[[ "${{ needs.changes.outputs.e2e_streaming }}" == "true" ]] && profiles+=("streaming")
65+
[[ "${{ needs.changes.outputs.e2e_router_replay_postgres }}" == "true" ]] && profiles+=("router-replay-postgres")
6566
6667
# Convert to JSON array
6768
if [ ${#profiles[@]} -eq 0 ]; then
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
apiVersion: apps/v1
2+
kind: Deployment
3+
metadata:
4+
name: postgres
5+
namespace: default
6+
spec:
7+
replicas: 1
8+
selector:
9+
matchLabels:
10+
app: postgres
11+
template:
12+
metadata:
13+
labels:
14+
app: postgres
15+
spec:
16+
containers:
17+
- name: postgres
18+
image: postgres:16-alpine
19+
imagePullPolicy: IfNotPresent
20+
ports:
21+
- containerPort: 5432
22+
name: postgres
23+
protocol: TCP
24+
env:
25+
- name: POSTGRES_DB
26+
value: vsr
27+
- name: POSTGRES_USER
28+
value: router
29+
- name: POSTGRES_PASSWORD
30+
value: router-secret
31+
readinessProbe:
32+
exec:
33+
command:
34+
- pg_isready
35+
- -U
36+
- router
37+
- -d
38+
- vsr
39+
initialDelaySeconds: 3
40+
periodSeconds: 3
41+
---
42+
apiVersion: v1
43+
kind: Service
44+
metadata:
45+
name: postgres
46+
namespace: default
47+
spec:
48+
selector:
49+
app: postgres
50+
ports:
51+
- name: postgres
52+
port: 5432
53+
targetPort: 5432
54+
protocol: TCP

docs/agent/state-taxonomy-and-inventory.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ Use it to answer three questions before adding or changing a stateful feature:
5252
| Surface | Primary owner | Current backend / default | Current durability class | Restart behavior today | Scale risk | Recommended direction |
5353
| --- | --- | --- | --- | --- | --- | --- |
5454
| Response API stored responses and conversations | router runtime, `src/semantic-router/pkg/responsestore/**` | Default `redis`; optional `memory` for local dev only | `shared_durable_workflow_state` | Response and conversation history survives restart when using the default Redis backend. The `memory` backend emits a startup warning and loses all data on restart. | Replica-local only when `memory` is explicitly selected; Redis backend is shared across replicas | Keep metadata and conversation chain in a durable server-owned store by default for product use. Prefer relational storage for metadata and queryability; keep large payloads in blob/object storage only if needed later. |
55-
| Router replay records | router runtime, `src/semantic-router/pkg/routerreplay/**`, `src/semantic-router/pkg/extproc/router_replay_setup.go` | Default `memory`; optional Redis, Postgres, Milvus | `audit_analytics_telemetry` presented as `ephemeral_request_state` by default | Restart drops replay history when default backend is used | Debuggability and audit posture degrade under restart and multi-replica routing | Prefer Postgres for durable operator-facing replay history. Keep Redis only for transient debug buffers and Milvus only when semantic replay search is explicitly needed. |
55+
| Router replay records | router runtime, `src/semantic-router/pkg/routerreplay/**`, `src/semantic-router/pkg/extproc/router_replay_setup.go` | Default `postgres`; optional `redis`, `milvus`, `memory` (local dev only) | `audit_analytics_telemetry` with `shared_durable_workflow_state` default | Replay history survives restart when using the default Postgres backend. The `memory` backend emits a startup warning and loses all records on restart. | Postgres provides SQL queryability for audit and compliance. Redis available for lightweight deployments. | Keep metadata and replay records in a durable server-owned store by default. Postgres is the default for long-term audit retention and compliance. Keep Milvus only when semantic replay search is explicitly needed. |
5656
| Semantic cache entries | router runtime, `src/semantic-router/pkg/cache/**` | Default `memory`; optional Redis, Milvus, hybrid | `ephemeral_request_state` in local dev; shared cache in scaled deploys | Restart flushes cache; replicas do not share hot entries by default | Cold-start latency, inconsistent cache hit rates, and uneven behavior across replicas | Keep this as cache, not a database table. Prefer Redis or hybrid shared backends for scaled deployments; document memory backend as local/dev or single-node only. |
5757
| RAG retrieval result cache | router runtime, `src/semantic-router/pkg/extproc/req_filter_rag_cache.go` | Process-wide singleton in-memory LRU with TTL | `ephemeral_request_state` | Restart flushes cache; cache is global per process, not per tenant or replica | Hidden shared mutable state, no observability, no durability, and no multi-replica coherence | Keep as optional cache only. Move to a pluggable shared cache backend if this becomes performance-critical, or document as local process optimization. |
5858
| Agentic memory vectors | router runtime, `src/semantic-router/pkg/memory/**` | Disabled by default; vector content leans on Milvus config when enabled | `shared_durable_workflow_state` when enabled | Depends on backend choice; not enabled by default | Product semantics remain ambiguous between experimental memory and supported user data | Keep vector embeddings in Milvus or another vector store, but pair them with explicit metadata and lifecycle ownership in a durable server-owned contract. |
@@ -78,7 +78,6 @@ Use it to answer three questions before adding or changing a stateful feature:
7878

7979
## Default Memory-Backed Surfaces To Treat As High Risk
8080

81-
- `global.services.router_replay.store_backend = memory`
8281
- `global.stores.semantic_cache.backend_type = memory`
8382
- `global.stores.vector_store.backend_type = memory` in dashboard defaults when enabled
8483
- RAG `cache_results` in `src/semantic-router/pkg/config/rag_plugin.go`

e2e/pkg/fixtures/http.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,11 @@ func (r *HTTPResponse) DecodeJSON(v any) error {
2424
return nil
2525
}
2626

27+
// DoGETRequest sends a GET request and returns the raw HTTP response.
28+
func DoGETRequest(ctx context.Context, httpClient *http.Client, url string) (*HTTPResponse, error) {
29+
return doJSONRequest(ctx, httpClient, http.MethodGet, url, nil, nil)
30+
}
31+
2732
func doJSONRequest(
2833
ctx context.Context,
2934
httpClient *http.Client,

e2e/profiles/all/imports.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ import (
1717
responseapi "github.com/vllm-project/semantic-router/e2e/profiles/response-api"
1818
responseapiredis "github.com/vllm-project/semantic-router/e2e/profiles/response-api-redis"
1919
responseapirediscluster "github.com/vllm-project/semantic-router/e2e/profiles/response-api-redis-cluster"
20+
routerreplaypostgres "github.com/vllm-project/semantic-router/e2e/profiles/router-replay-postgres"
2021
routingstrategies "github.com/vllm-project/semantic-router/e2e/profiles/routing-strategies"
2122
streaming "github.com/vllm-project/semantic-router/e2e/profiles/streaming"
2223
)
@@ -61,6 +62,11 @@ func init() {
6162
func() framework.Profile { return responseapirediscluster.NewProfile() },
6263
framework.ProfileCapabilities{LocalImages: mockVLLMLocalImages},
6364
)
65+
register(
66+
"router-replay-postgres",
67+
func() framework.Profile { return routerreplaypostgres.NewProfile() },
68+
framework.ProfileCapabilities{LocalImages: mockVLLMLocalImages},
69+
)
6470
register("routing-strategies", func() framework.Profile { return routingstrategies.NewProfile() }, framework.ProfileCapabilities{})
6571
register("streaming", func() framework.Profile { return streaming.NewProfile() }, framework.ProfileCapabilities{})
6672
}
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
package routerreplaypostgres
2+
3+
import (
4+
"context"
5+
6+
"github.com/vllm-project/semantic-router/e2e/pkg/framework"
7+
gatewaystack "github.com/vllm-project/semantic-router/e2e/pkg/stacks/gateway"
8+
)
9+
10+
const (
11+
valuesFile = "e2e/profiles/router-replay-postgres/values.yaml"
12+
postgresManifest = "deploy/kubernetes/router-replay/postgres.yaml"
13+
)
14+
15+
var resourceManifests = []string{
16+
"deploy/kubernetes/response-api/mock-vllm.yaml",
17+
"deploy/kubernetes/response-api/gwapi-resources.yaml",
18+
}
19+
20+
// Profile implements the Router Replay Postgres test profile.
21+
type Profile struct {
22+
stack *gatewaystack.Stack
23+
}
24+
25+
// NewProfile creates a new Router Replay Postgres profile.
26+
func NewProfile() *Profile {
27+
return &Profile{
28+
stack: gatewaystack.New(gatewaystack.Config{
29+
Name: "router-replay-postgres",
30+
SemanticRouterValuesFile: valuesFile,
31+
PrerequisiteManifests: []string{postgresManifest},
32+
ResourceManifests: resourceManifests,
33+
}),
34+
}
35+
}
36+
37+
// Name returns the profile name.
38+
func (p *Profile) Name() string {
39+
return "router-replay-postgres"
40+
}
41+
42+
// Description returns the profile description.
43+
func (p *Profile) Description() string {
44+
return "Tests Router Replay restart recovery using the default Postgres backend"
45+
}
46+
47+
// Setup deploys Postgres, the router, and gateway resources.
48+
func (p *Profile) Setup(ctx context.Context, opts *framework.SetupOptions) error {
49+
return p.stack.Setup(ctx, opts)
50+
}
51+
52+
// Teardown removes the stack.
53+
func (p *Profile) Teardown(ctx context.Context, opts *framework.TeardownOptions) error {
54+
return p.stack.Teardown(ctx, opts)
55+
}
56+
57+
// GetTestCases returns the test cases for this profile.
58+
func (p *Profile) GetTestCases() []string {
59+
return []string{
60+
"router-replay-restart-recovery",
61+
}
62+
}
63+
64+
// GetServiceConfig returns the service configuration for accessing the deployed service.
65+
func (p *Profile) GetServiceConfig() framework.ServiceConfig {
66+
return p.stack.ServiceConfig()
67+
}
Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
replicaCount: 1
2+
image:
3+
repository: ghcr.io/vllm-project/semantic-router/extproc
4+
tag: latest
5+
pullPolicy: Never
6+
config:
7+
version: v0.3
8+
listeners: []
9+
providers:
10+
defaults:
11+
default_model: openai/gpt-oss-20b
12+
models:
13+
- name: openai/gpt-oss-20b
14+
backend_refs:
15+
- name: test-endpoint
16+
endpoint: mock-vllm.default.svc.cluster.local:8000
17+
weight: 1
18+
routing:
19+
decisions:
20+
- name: default_decision
21+
description: Default catch-all decision
22+
priority: 1
23+
rules:
24+
operator: AND
25+
conditions: []
26+
modelRefs:
27+
- model: openai/gpt-oss-20b
28+
use_reasoning: false
29+
plugins:
30+
- type: router_replay
31+
configuration:
32+
enabled: true
33+
max_records: 1000
34+
capture_request_body: true
35+
capture_response_body: true
36+
max_body_bytes: 65536
37+
signals: {}
38+
modelCards:
39+
- name: openai/gpt-oss-20b
40+
global:
41+
router:
42+
strategy: priority
43+
services:
44+
response_api:
45+
enabled: true
46+
store_backend: memory
47+
ttl_seconds: 86400
48+
max_responses: 1000
49+
router_replay:
50+
store_backend: postgres
51+
ttl_seconds: 2592000
52+
async_writes: false
53+
postgres:
54+
host: postgres.default.svc.cluster.local
55+
port: 5432
56+
database: vsr
57+
user: router
58+
password: router-secret
59+
ssl_mode: disable
60+
max_open_conns: 10
61+
max_idle_conns: 5
62+
conn_max_lifetime: 300
63+
table_name: router_replay
64+
api:
65+
batch_classification:
66+
max_batch_size: 100
67+
concurrency_threshold: 5
68+
max_concurrency: 8
69+
metrics:
70+
enabled: true
71+
detailed_goroutine_tracking: false
72+
high_resolution_timing: false
73+
sample_rate: 1.0
74+
observability:
75+
tracing:
76+
enabled: false
77+
stores:
78+
semantic_cache:
79+
embedding_model: mmbert
80+
memory:
81+
embedding_model: mmbert
82+
vector_store:
83+
embedding_model: mmbert
84+
integrations: {}
85+
model_catalog:
86+
kbs: []
87+
modules:
88+
prompt_guard:
89+
enabled: false
90+
model_ref: ""
91+
model_id: ""
92+
jailbreak_mapping_path: ""
93+
use_mmbert_32k: false
94+
classifier:
95+
domain:
96+
model_ref: ""
97+
model_id: ""
98+
category_mapping_path: ""
99+
use_mmbert_32k: false
100+
pii:
101+
model_ref: ""
102+
model_id: ""
103+
pii_mapping_path: ""
104+
use_mmbert_32k: false
105+
resources:
106+
limits:
107+
cpu: '2'
108+
memory: 10Gi
109+
requests:
110+
cpu: 500m
111+
memory: 2Gi

0 commit comments

Comments
 (0)