vllm-project
diff --git a/‎website/i18n/zh-Hans/docusaurus-plugin-content-docs/current/proposals/Prism-153key.md‎
Lines changed: 206 additions & 0 deletions b/‎website/i18n/zh-Hans/docusaurus-plugin-content-docs/current/proposals/Prism-153key.md‎
Lines changed: 206 additions & 0 deletions
diff --git a/‎website/i18n/zh-Hans/docusaurus-plugin-content-docs/current/proposals/advanced-tool-filtering.md‎
Lines changed: 140 additions & 0 deletions b/‎website/i18n/zh-Hans/docusaurus-plugin-content-docs/current/proposals/advanced-tool-filtering.md‎
Lines changed: 140 additions & 0 deletions
@@ -0,0 +1,206 @@
+---
+translation:
+  source_commit: "eb9f384f"
+  source_file: "docs/proposals/Prism-153key.md"
+  outdated: false
+---
+
+# [提案] PRISM：面向 vLLM-SR 模型选择的 153 键合法性层
+
+**Issue：** #1422  
+**作者：** Mossaab Souaissa — MSIA Systems  
+**里程碑：** v0.3 — Themis  
+**参考：** https://doi.org/10.5281/zenodo.18750029  
+**白皮书：** https://github.com/user-attachments/files/25750911/PRISM-Vllm-SR-whitepaper-COMPLET-EN.pdf
+
+---
+
+## 1. 背景与动机
+
+vLLM-SR 回答：**哪个模型最适合该请求？**  
+PRISM 回答：**所选模型是否有资格回答这一具体查询？**
+
+二者互补，而非重复。
+
+### 「撒谎模型」问题
+
+若无结构约束，任意模型都可回答任意查询——即便超出训练领域。这会产生**自信幻觉**：模型在不擅长的领域仍以高置信度即兴作答。
+
+当前 vLLM-SR 流水线在模型选定后**没有**合法性校验。PRISM 在**不修改**既有路由逻辑的前提下增加该层。
+
+### 设计原则：增量扩展、不破坏兼容
+
+PRISM 作为现有组件的**可选扩展**集成：
+
+- 若某模型配置中无 `prism.enabled` → 行为**不变**
+- 若决策中无 `type: "prism-execution"` 插件块 → **跳过** Key 3
+- 若请求时尚未就绪 PRISM 注册表 → 回退到 `"general"` → 标准 vLLM-SR 路由
+
+| PRISM 组件 | 集成点 | 类型 |
+|----------------|------------------|------|
+| Key 1 — QUALIFICATION | `config.yaml` 中的 `model_config` | `prism.enabled: true` + 启动时自动发现 |
+| Key 2 — CLASSIFICATION | `req_filter_classification.go` | 新求值块 — 进程内 `candle-binding` |
+| Key 3 — EXECUTION | `req_filter_prism_execution.go` | 新 ExtProc 过滤器 — 沿用 `req_filter_jailbreak.go` 模式 |
+| 153-Registry | `pkg/registry/prism_registry.go` | 内存存储，启动时异步填充 Key 1 |
+
+**本 PR 范围：仅 hybrid 模式**（Key 1 + Key 2 + Key 3）。`fine_filter` 与 `coarse_filter` 见第 9 节，为后续变体。
+
+---
+
+## 2. 架构概览
+
+（ASCII 流程图与英文版一致，见英文 `docs/proposals/Prism-153key.md`。）
+
+要点：**PHASE 1** 增加 `runPrismClassification()`；**PHASE 3** 增加 `filterCandidatesByPrism()` 与重路由循环；**PHASE 4** 增加 `runPrismExecution()`。启动时 `NewPrismRegistry` 后台执行 `qualifyAllModels()`，就绪前 Key 2 将域置为 `"general"`。
+
+---
+
+## 3. Key 1 — QUALIFICATION（启动时异步自动发现）
+
+### 3.1 原则
+
+运维人员**仅**在 `model_config` 中声明 `prism.enabled: true`。
+
+`NewPrismRegistry()` 在后台 goroutine 中对每个启用模型发送 Key 1 资格提示词，解析 JSON 自声明并写入 153-Registry。路由器**立即**启动；goroutine 完成后 PRISM 才对后续请求生效。
+
+初始化期间：`IsReady()` 为 `false` → `runPrismClassification()` 置 `ctx.PrismDomain = "general"` → 无 PRISM 的标准路由。
+
+### 3.2 config.yaml — 最小人类声明
+
+（YAML 示例与英文版相同。）
+
+### 3.3 Key 1 资格提示词
+
+（英文提示词正文与英文版相同，用于与模型交互。）
+
+**HTTP 调用：** `POST http://{VLLMEndpoint.Address}:{VLLMEndpoint.Port}/v1/chat/completions`  
+**鉴权：** `Authorization: Bearer {ModelParams.AccessKey}`（若设置 AccessKey）  
+**超时：** 每模型 30 秒  
+**失败或 domain 为 "general"/"unknown"：** 该模型不进入注册表并打警告日志。
+
+### 3.4 Go 结构体
+
+（`PrismModelConfig`、`PrismThresholds`、`PrismConfig` 等定义与英文版代码块一致。）
+
+### 3.5 153-Registry — 异步初始化的内存存储
+
+（`RegistryEntry`、`PrismRegistry` 及方法与英文版一致。）
+
+### 3.6 分数档位
+
+| 分数 | 级别 | 含义 |
+|-------|-------|---------|
+| 0.00 - 0.30 | 浅显 | 通识，无专精 |
+| 0.31 - 0.60 | 基础 | 有专业词汇，深度有限 |
+| 0.61 - 0.90 | 确认 | 具备流程与方法的实际专精 |
+| 0.91 - 1.00 | 精通 | 标准、公式与复杂案例 |
+
+### 3.7 经验分规则
+
+（`init` / `ACCEPTED` / `REFUSED` / `clamp` / `status` 与英文版相同。）
+
+---
+
+## 4. Key 2 — CLASSIFICATION（`req_filter_classification.go` 中新块）
+
+### 4.1 原则
+
+Key 2 是 `req_filter_classification.go` 中的新求值块，进程内使用 `candle-binding` 的 `candle.GetEmbedding()`。Key 2 **不回答问题**，只做意图分类，结果写入 `RequestContext` 供 Key 3 在 PHASE 4 使用。
+
+### 4.2 RequestContext 扩展
+
+（`PrismDomain`、`PrismConfidence` 等字段与英文版一致。）
+
+### 4.3 嵌入 API
+
+（`GetEmbedding` 签名与英文版一致。）
+
+### 4.4 域嵌入缓存
+
+在 `qualifyAllModels()` 结束时**预计算**各域嵌入并存入 `RegistryEntry.DomainEmbedding`，避免每请求重算。运行时 Key 2 仅计算**查询**的 1 次嵌入，与缓存的域嵌入做余弦相似度比较。
+
+### 4.5 回退规则
+
+注册表未就绪、`GetEmbedding` 出错、最佳相似度低于 `ConfidenceStrict`、未知域 → 一律 `ctx.PrismDomain = "general"`，不抛错、不阻塞请求。
+
+### 4.6 置信度档位与 Key 3 行为
+
+（与英文版表格一致：&lt;0.40 跳过 Key 3；0.40–0.69 严格档；0.70–0.89 灵活档；≥0.90 自动接受等。）
+
+### 4.7 关键词提取
+
+（`extractKeywords` 与英文版一致。）
+
+---
+
+## 5. Key 3 — EXECUTION（新 ExtProc 过滤器）
+
+### 5.1 原则
+
+`runPrismExecution()` 位于 `req_filter_prism_execution.go`，模式与 `req_filter_jailbreak.go` **一致**。强制所选模型按 Key 1 在启动时固定的域自证合法性，而非运行时随意声称。
+
+### 5.2 参考模式
+
+（`runJailbreakFilter` 与 Key 3 实现代码块与英文版一致。）
+
+### 5.3 共享辅助
+
+（`req_filter_prism_helpers.go` 与英文版一致。）
+
+### 5.4 重路由循环
+
+（`processor_req_body.go` 中循环与英文版一致。）
+
+### 5.5 全局 REFUSED 与 HTTP 响应
+
+`ctx.Blocked = true` 与越狱路径相同，沿用现有 `buildResponse()`。
+
+---
+
+## 6. filterCandidatesByPrism — 模型选择前的预过滤
+
+（逻辑与英文版 `filterCandidatesByPrism` 一致：域为 general 或未就绪时不过滤；无 PRISM 合格候选时退回全量候选，**绝不**因 PRISM 单独阻塞标准路由。）
+
+---
+
+## 7. PRISM 全局配置
+
+（YAML `prism:` 块与英文版相同；注释说明工业场景默认阈值，通用场景可降低 `confidence_strict` 与 `expertise_min_score`。）
+
+---
+
+## 8. 新建/修改文件清单
+
+（文件列表与英文版第 8 节一致。）
+
+---
+
+## 9. 三种集成模式（后续变体）
+
+| 模式 | 激活的 Key | 范围 |
+|------|------------|------|
+| `hybrid` | Key 1 + 2 + 3 | **本 PR**，合法性最强 |
+| `coarse_filter` | Key 1 + 2 | 未来：仅路由前过滤 |
+| `fine_filter` | Key 1 + 3 | 未来：仅路由后校验 |
+
+---
+
+## 10. 待确认问题（@HuaminChen、@Xunzhuo）
+
+1. `selectModelFromCandidates` 是否包含 `decision` 第三参数。  
+2. 153-Registry 仅内存是否可接受 v0.3，抑或需 Redis/SQLite。  
+3. Key 2 域嵌入与 `candle-binding` 并发/CGo 一致性。  
+4. 是否提供 `config.recipe-prism-general.yaml` 降低通用场景阈值。  
+5. `UnregisteredPolicy` 默认 `passthrough` vs `refuse` 的产品取舍。
+
+---
+
+## 11. 性能说明
+
+Key 2 每请求一次 `GetEmbedding` + 与 N 个预缓存域嵌入比对；工业场景 N 常为 3–10，开销可忽略。域数量很大时可按查询哈希做嵌入相似度缓存（后续优化）。
+
+---
+
+## 12. 参考
+
+- PRISM 白皮书与 Zenodo DOI、Issue #1422、Draft PR #1425 链接与英文版相同。
@@ -0,0 +1,140 @@
+---
+translation:
+  source_commit: "5f14781c"
+  source_file: "docs/proposals/advanced-tool-filtering.md"
+  outdated: false
+---
+
+# 工具选择的高级工具过滤
+
+Issue: [#1002](https://github.com/vllm-project/semantic-router/issues/1002)
+
+---
+
+## 现状
+
+当前工具选择仅使用嵌入相似度、相似度阈值与 top-k。当嵌入相近但意图不一致时，可能选到错误领域的工具。
+
+[#1002](https://github.com/vllm-project/semantic-router/issues/1002) 提议引入高级工具过滤能力，通过更精细的相关性过滤减少误选，同时保持默认行为不变。
+
+## 方案
+
+在嵌入候选集合检索之后，增加**可选的高级过滤阶段**。该阶段应用确定性过滤（允许/禁止列表、可选类别门控、词面重叠阈值）以及融合嵌入相似度与词面、标签、名称、类别信号的**组合得分重排器**。若 `advanced_filtering.enabled=false`，行为与现有实现一致。
+
+方案优点：延迟可控、不引入新模型依赖、可通过配置完整解释。
+
+## 对比测试结果
+
+测试配置：
+
+- 查询集：20 条（17 正例、3 负例），覆盖天气、邮件、搜索、计算、日历等场景
+- 工具库：5 个工具（get_weather、search_web、calculate、send_email、create_calendar_event）
+- 迭代：10 次
+- 高级过滤配置：`min_lexical_overlap=1`、`min_combined_score=0.35`、`weights={embed:0.7, lexical:0.2, tag:0.05, name:0.05}`
+
+评估结果：
+
+![高级工具过滤评估结果](../../../../../static/img/proposals/advanced-tool-filtering.png)
+
+| 指标 | 基线 | 高级 | 增量 |
+|------|------|------|------|
+| **准确率** | 55.00% | 90.00% | **+35.00%** |
+| **精确率** | 78.57% | 94.12% | **+15.55%** |
+| **召回率** | 64.71% | 94.12% | **+29.41%** |
+| **假阳性率** | 100.00% | 33.33% | **-66.67%** |
+| 平均延迟 | 0.0162 ms | 0.0197 ms | +0.0036 ms |
+| P95 延迟 | 0.0256 ms | 0.0288 ms | +0.0032 ms |
+
+## 数据流
+
+```mermaid
+flowchart TD
+    A[请求 tool_choice=auto] --> B[提取分类文本]
+    B --> C{启用 Tools DB?}
+    C -->|否| D[退出]
+    C -->|是| E{advanced_filtering.enabled?}
+    E -->|否| F[FindSimilarTools: 嵌入 + 阈值 + top-k]
+    E -->|是| G[FindSimilarToolsWithScores: 嵌入 + 阈值 + 候选池]
+    G --> H[允许/禁止 + 类别过滤 + 词面重叠]
+    H --> I[计算组合得分 + 重排]
+    I --> J[选取 top-k 工具]
+    F --> K[更新请求 tools]
+    J --> K[更新请求 tools]
+```
+
+## 配置变更
+
+高级过滤默认关闭。启用时下列字段生效。
+
+| 字段 | 类型 | 默认 | 范围 / 说明 |
+|------|------|------|-------------|
+| `enabled` | bool | `false` | 启用高级过滤。 |
+| `candidate_pool_size` | int | `max(top_k*5, 20)` | 若设置且 >0 则直接使用。 |
+| `min_lexical_overlap` | int | `0` | 查询与工具词表之间的最小唯一 token 重叠数。 |
+| `min_combined_score` | float | `0.0` | 组合得分阈值，范围 [0.0, 1.0]。 |
+| `weights.embed` | float | `1.0` | 未设置权重时 embed 默认为 1.0。 |
+| `weights.lexical` | float | `0.0` | 可选权重，范围 [0.0, 1.0]。 |
+| `weights.tag` | float | `0.0` | 可选权重，范围 [0.0, 1.0]。 |
+| `weights.name` | float | `0.0` | 可选权重，范围 [0.0, 1.0]。 |
+| `weights.category` | float | `0.0` | 可选权重，范围 [0.0, 1.0]。 |
+| `use_category_filter` | bool | `false` | 为 true 时在置信度足够时按类别过滤。 |
+| `category_confidence_threshold` | float | `nil` | 若设置，仅当决策置信度 ≥ 阈值时才应用类别过滤。 |
+| `allow_tools` | []string | `[]` | 工具名白名单；非空时仅保留这些工具。 |
+| `block_tools` | []string | `[]` | 工具名黑名单。 |
+
+## 打分与过滤实现
+
+### 分词
+
+规则：转小写并在非字母数字处切分。仅 Unicode 字母与数字计为 token。  
+实现见：[src/semantic-router/pkg/tools/relevance.go#L240](https://github.com/samzong/semantic-router/blob/feat/advanced-tool-filtering/src/semantic-router/pkg/tools/relevance.go#L240)。
+
+### 词面重叠
+
+词面重叠统计下列**唯一 token** 的交集：
+
+- 工具名
+- 工具描述
+- 工具类别
+
+不包含标签。标签为单独信号。
+
+### 组合得分公式
+
+对每个候选工具：
+
+```
+combined = (w_embed * embed + w_lexical * lexical + w_tag * tag + w_name * name + w_category * category) / (w_embed + w_lexical + w_tag + w_name + w_category)
+```
+
+- `embed` 为相似度得分，限制在 [0,1]。
+- `lexical` 与 `tag` 为重叠得分，按查询 token 数 / 标签 token 数归一化。
+- `name` 与 `category` 为二元得分（0 或 1）。
+- 若未设置权重，embed 默认为 1.0。
+- 若所有权重显式为 0，则组合得分为 0；当 `min_combined_score > 0` 时所有候选都会被滤掉。
+
+### 类别置信门控
+
+仅当同时满足以下条件时类别过滤才生效：
+
+- `use_category_filter` 为 true，
+- 存在类别，且
+- 决策置信度 ≥ `category_confidence_threshold`（若已设置）。
+
+## 错误处理与回退
+
+- 工具选择失败且 `tools.fallback_to_empty=true`：请求在**无工具**的情况下继续，并记录警告。
+- 若 `fallback_to_empty=false`：返回分类错误。
+- 无效的高级配置在加载阶段被拒绝（`validator.go` 中的范围校验）。
+
+## API 变更
+
+新增或修改的 API：
+
+```go
+// src/semantic-router/pkg/tools/tools.go
+func (db *ToolsDatabase) FindSimilarToolsWithScores(query string, topK int) ([]ToolSimilarity, error)
+
+// src/semantic-router/pkg/tools/relevance.go
+func FilterAndRankTools(query string, candidates []ToolSimilarity, topK int, advanced *config.AdvancedToolFilteringConfig, selectedCategory string) []openai.ChatCompletionToolParam
+```