SignalRAG

A fast web-search RAG workbench inspired by ChatGPT Search:

rewrites the user question into targeted search queries
searches multiple providers concurrently
fetches and extracts source pages in parallel
uses a lightweight query planner before answer generation
exposes search controls for domains, recency, locale, and citation verifier choice
evaluates retrieval quality with a CRAG-style corrective pass
runs multi-step retrieval in Deep Research mode
integrates with Chromium through a local search URL and unpacked extension
fuses multi-query search results with reciprocal rank fusion before fetching
ranks passages with source-aware contextual BM25 signals
packs answer context with query-aware compression and long-context reordering
reranks passages with hybrid lexical scoring and source-quality signals
generates a cited answer from retrieved evidence and returns claim-level citation checks
falls back to extractive answers when no LLM API key is configured

Quick Start

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python -m fast_rag.app

Open http://127.0.0.1:8000.

Chromium Integration

SignalRAG exposes a search-engine-compatible URL:

http://127.0.0.1:8000/engine?q=%s&mode=pro

It also includes a Manifest V3 extension in extensions/signalrag-chromium with:

sr omnibox keyword search.
selected-text context menu search.
browser side panel search.
extension options for the local API URL and default mode.

Load it from chrome://extensions with Developer mode and "Load unpacked". See extensions/signalrag-chromium/README.md.

Optional Accuracy Upgrades

Set these environment variables before starting the server:

export OPENAI_API_KEY="..."
export OPENAI_MODEL="gpt-4.1-mini"
export DEEPSEEK_API_KEY="..."
export DEEPSEEK_MODEL="deepseek-v4-flash"
export DEEPSEEK_PLANNER_MODEL="deepseek-v4-flash"
export DEEPSEEK_VERIFIER_MODEL="deepseek-v4-flash"
export BRAVE_API_KEY="..."

Without API keys, the app still works using DuckDuckGo, Bing, and Yahoo HTML search fallbacks plus an extractive cited answer.

If both DeepSeek and OpenAI keys are present, DeepSeek is used first by default. Override with:

export LLM_PROVIDER="openai"   # or deepseek / auto

API

curl -X POST http://127.0.0.1:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{
    "query":"how does ChatGPT search work",
    "mode":"pro",
    "lens":"official",
    "max_results":10,
    "include_domains":["openai.com","help.openai.com"],
    "exclude_domains":["medium.com"],
    "recency":"year",
    "country":"us",
    "language":"en",
    "citation_verifier":"auto"
  }'

Modes:

fast: low latency, fewer pages, shorter timeouts.
pro: balanced mode for fresher, comparative, or multi-hop questions.
deep: Deep Research mode. It builds several focused research steps, runs them in parallel, dedupes the evidence, and returns research_trace.

The API response includes query_plan, which reports the planner's inferred intent, freshness need, search depth, and DeepSeek reasoning effort:

none: thinking disabled for simple lookups.
high: thinking enabled for comparisons, recommendations, API/code guidance, multi-hop synthesis, or uncertainty.
max: thinking enabled with max effort only for deep research, long-horizon tasks, formal proof, or many constraints.

Deep Research mode always enables DeepSeek thinking mode with at least reasoning_effort: high, and uses max only when the planner marks the query as a complex long-horizon or max-effort task. This keeps Deep Research stable while still using DeepSeek's OpenAI-format controls: {"thinking":{"type":"enabled"}} plus reasoning_effort of high or max.

The request can include:

lens: web, official, academic, forums, news, pdf, or finance. Lenses add intent-specific query rewrites while still respecting explicit domain, recency, country, and language controls.
include_domains / exclude_domains: allowlist or denylist domains.
recency: any, day, week, month, or year.
country / language: two-letter locale hints for supported providers.
citation_verifier: auto, lexical, or deepseek.

The response includes:

crag: retrieval quality before and, if needed, after a corrective search.
research_trace: the per-step trace used by Deep Research mode.
context_packing: answer-context compression stats, including strategy, packed evidence count, budget, packed characters, and compression ratio.
candidate_citations[].signals.trust_tier: source credibility tier used by ranking and CRAG, such as government, academic, standards, official_docs, medical, news_wire, reference, general, or low_signal.
claim_citations: per-claim citation trace. With DeepSeek configured, auto uses a judge model for supported/weak/contradicted/insufficient decisions; otherwise it falls back to the fast lexical verifier.

Source Trust Tiers

SignalRAG scores source credibility before final passage ranking. The trust tiers are intentionally conservative:

government: .gov, .mil, .edu-adjacent public institutions, major public agencies, regulators, and intergovernmental institutions such as CDC, FDA, NIH, NIST, SEC, WHO, UN, IMF, OECD, and World Bank.
academic: research repositories, journals, and scholarly publishers such as arXiv, ACL Anthology, Nature, Science, NEJM, JAMA, BMJ, PubMed/NCBI, Cell, Springer, and ScienceDirect.
standards: standards and security bodies such as W3C, IETF, ISO, OWASP, MITRE, NIST CSRC, and CISA.
official_docs: first-party product or developer documentation, including OpenAI, DeepSeek, Anthropic, Google, Microsoft, AWS, GitHub, Python, Perplexity, Tavily, LlamaIndex, and Ragas docs.
medical: evidence-oriented public medical references such as MedlinePlus, Mayo Clinic, Cleveland Clinic, MSD Manuals, and NCI.
news_wire: high-accountability news and public media sources such as AP, Reuters, BBC, and NPR.
reference: broad reference sources such as Britannica and Wikipedia. These are useful for orientation but are boosted less than primary sources.
low_signal: social, forum, or open publishing domains such as Reddit, Quora, Medium, Substack, and Pinterest. These can still be useful for experience-oriented queries, but are not treated as authoritative evidence.

These tiers are based on source-evaluation principles from Google Search's E-E-A-T guidance, academic credibility guidance that prioritizes .edu, .gov, and peer-reviewed evidence, and NCI guidance that health information should come from government agencies, hospitals, universities, medical journals, and professional societies.

Further Improvement Roadmap

This roadmap is based on current RAG research and search-product patterns:

Source lenses and search controls: advanced search products expose domain/date/location/source controls and reusable lenses. SignalRAG now supports first-pass source lenses for web, official, academic, forums, news, PDFs, and finance. Next step: make lenses editable and persist custom lens presets in the UI.
- Reference: Kagi Lenses
- Reference: Perplexity Search Filters
- Reference: Tavily Search API
Contextual retrieval: enrich extracted passages with source title, domain, and snippet context before ranking. SignalRAG now applies source-aware contextual BM25 signals before hybrid scoring. Next step: persist section/date metadata and add optional embeddings plus a cross-encoder reranker. Anthropic reports that contextual BM25 + contextual embeddings + reranking substantially reduces retrieval misses.
- Reference: Anthropic Contextual Retrieval
Stronger reranking: keep the current fast BM25-style scorer for first pass, then add an optional cross-encoder or LLM reranker for Pro/Deep modes. Use this only after broad recall, so latency stays controlled.
Query decomposition and RAG-Fusion: generate multiple targeted queries for complex questions, then fuse ranked results with reciprocal rank fusion. SignalRAG now applies RRF before page fetching, so URLs that appear across several query rewrites are prioritized before extraction and passage ranking. Next step: add LLM-generated subquestions for high-complexity queries and tune fusion weights by lens/provider.
- Reference: LlamaIndex Query Transformations
- Reference: RAG-Fusion
Self-reflection and correction loop: extend the current CRAG pass so the model can decide whether retrieval is needed, whether evidence is sufficient, and whether the draft answer needs another retrieval pass.
- Reference: Self-RAG
- Reference: Corrective RAG
Citation and faithfulness evaluation: keep claim-level citation checks, then add offline regression metrics for context relevance, groundedness, answer relevance, faithfulness, and contextual recall.
- Reference: TruLens RAG Triad
- Reference: DeepEval RAG Evaluation
Context packing: avoid dumping too much evidence into the final prompt. SignalRAG now uses query-aware extractive compression, adds source context to every packed passage, and reorders evidence in a "sandwich" pattern so strong sources appear near the beginning and end of the model context. This follows the same practical lesson as LongLLMLingua and lost-in-the-middle research: key information density and position matter, even with long-context models. Next step: add optional LLMLingua-style small-model compression for very large reports.
- Reference: Lost in the Middle
- Reference: LongLLMLingua
Deep Research UX and reasoning: expose a visible plan, progress trace, source controls, exportable report, table of contents, and source list for review. SignalRAG now runs deeper research steps, including countercheck and synthesis, and uses adaptive DeepSeek thinking for final synthesis: high by default, max for long-horizon tasks.
- Reference: ChatGPT Deep Research
- Reference: Perplexity Sonar Deep Research
- Reference: DeepSeek Thinking Mode

Example Runs

These examples were run locally on 2026-05-12 with DeepSeek enabled, HTML search fallbacks, and no Brave API key. They are the best demo cases because they use official sources, produce inline citations, and exercise different parts of the retrieval stack.

Use case	Query	Mode	Observed result
Fast official API lookup	`DeepSeek API chat completion base URL model name and first API call`	`fast`	2 official DeepSeek citations, 3 supported claims, ~4.0s
Product search explanation	`How does ChatGPT search work and how does it cite sources?`	`pro`	2 official OpenAI citations, 3 supported claims, ~9.2s
API docs with source controls	`OpenAI web search API citations and domain filtering`	`pro`	OpenAI developer citation, planner chose `high` reasoning, ~12.6s
Deep Research trace	`Explain ChatGPT search for Enterprise and Edu data sharing and source citations.`	`deep`	2 official OpenAI citations, 3 research steps, 5 supported claims, ~19.2s

Fast official API lookup

curl -X POST http://127.0.0.1:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{
    "query":"DeepSeek API chat completion base URL model name and first API call",
    "mode":"fast",
    "max_results":8,
    "include_domains":["api-docs.deepseek.com"],
    "recency":"year",
    "country":"us",
    "language":"en",
    "citation_verifier":"auto"
  }'

Why this is a strong demo: it shows the lightweight planner choosing reasoning_effort: none, keeps latency low, and returns only official DeepSeek API documentation as citations.

ChatGPT Search citation behavior

curl -X POST http://127.0.0.1:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{
    "query":"How does ChatGPT search work and how does it cite sources?",
    "mode":"pro",
    "max_results":10,
    "include_domains":["openai.com","help.openai.com"],
    "recency":"year",
    "country":"us",
    "language":"en",
    "citation_verifier":"auto"
  }'

Why this is a strong demo: it exercises official-source prioritization, answer citations, and claim-level verification against OpenAI Help Center and OpenAI announcement pages.

API docs with source controls

curl -X POST http://127.0.0.1:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{
    "query":"OpenAI web search API citations and domain filtering",
    "mode":"pro",
    "max_results":10,
    "include_domains":["developers.openai.com","platform.openai.com","openai.com"],
    "recency":"year",
    "country":"us",
    "language":"en",
    "citation_verifier":"auto"
  }'

Why this is a strong demo: it shows include-domain controls, freshness-aware planning for API documentation, and citation grounding from developer docs.

Deep Research trace

curl -X POST http://127.0.0.1:8000/api/search \
  -H "Content-Type: application/json" \
  -d '{
    "query":"Explain ChatGPT search for Enterprise and Edu data sharing and source citations.",
    "mode":"deep",
    "max_results":12,
    "include_domains":["help.openai.com","openai.com"],
    "recency":"year",
    "country":"us",
    "language":"en",
    "citation_verifier":"auto"
  }'

Why this is a strong demo: it runs Deep Research mode, returns a multi-step research_trace, and verifies claims across workspace policy and citation behavior sources.

Recall Evaluation

python -m fast_rag.eval --mode fast --top-k 5
python -m fast_rag.eval --mode pro --top-k 8
python -m fast_rag.eval --mode deep --top-k 10

The evaluator reports recall@k, hit rate, MRR, and latency over a small set of known-answer web-search cases.

End-to-End Benchmark

For RAG systems, do not rely on a single benchmark number. A useful eval suite should cover both retrieval and generation:

Retrieval: recall@k, hit rate, MRR/nDCG, source diversity, latency.
Generation: answer relevance, faithfulness/groundedness, citation coverage, supported-claim rate, contradiction rate, fallback rate, and cost/latency.
Dataset size: use 20-50 golden queries for early development, 100-300 for a release gate, and 500+ mixed production traces once real usage exists. Keep separate slices for factual lookup, API docs, fresh/news, comparison, multi-hop, Deep Research, and adversarial/no-answer cases.

SignalRAG includes a small end-to-end benchmark runner. There are two useful 50-case suites:

extended: a golden regression suite. It intentionally repeats known topics and expected sources, so it is good for catching regressions but should not be presented as representative user traffic.
realistic: a short-query suite with no include-domain allowlists. It uses everyday search-box phrasing such as python read json file, tsa liquids rule carry on, and git rebase vs merge. This is still hand-curated, not production telemetry.

For truly real query distributions, use anonymized logs or public datasets such as MS MARCO, whose questions come from anonymized Bing queries, Natural Questions, whose questions come from anonymized aggregated Google queries, and BEIR, which combines diverse retrieval tasks.

Golden regression run:

python -m fast_rag.benchmark \
  --api-base http://127.0.0.1:8000 \
  --suite extended \
  --clear-response-cache \
  --timeout 220 \
  --output benchmark_results/signalrag-benchmark-2026-05-12-50cases.json

Latest golden 50-case local run with DeepSeek enabled, HTML search fallbacks, no Brave API key, and response cache cleared at the start:

Metric	Result
Cases	50
Expected source recall	0.9067
Used source recall	0.7567
Answer term coverage	0.7533
Citation coverage	0.8139
Supported claim rate	0.7906
Review claim rate	0.2042
CRAG sufficient rate	0.9600
Fallback rate	0.0400
Cache hit rate	0.0400
Average latency	15.4s
P95 latency	41.3s

Interpretation: the larger suite keeps P95 latency at the previous 6-case level while lowering average latency to 15.4s. Expected-source recall stays above 0.90, citation coverage remains above 0.81, and only 2 of 50 cases fell back to extractive synthesis. Cache hits are intentionally low in this cold-start extended suite because the cases are broader paraphrases rather than repeated queries. A warm-cache repeat of the same 50 cases reached 100% response-cache hit rate with about 4ms average API wall time.

Realistic short-query run:

python -m fast_rag.benchmark \
  --api-base http://127.0.0.1:8000 \
  --suite realistic \
  --clear-response-cache \
  --timeout 220 \
  --output benchmark_results/signalrag-benchmark-2026-05-12-realistic-50cases.json

Latest realistic 50-case local run with DeepSeek enabled, DuckDuckGo/Bing/Yahoo HTML search fallbacks, no Brave API key, and response cache cleared at the start:

Metric	Result
Cases	50
Source-scored cases	50
Expected source recall	0.9733
Used source recall	0.9467
Answer term coverage	0.9800
Citation coverage	0.8818
Supported claim rate	0.8656
Review claim rate	0.1320
CRAG sufficient rate	0.9800
Fallback rate	0.0000
Cache hit rate	0.0000
Average latency	7.4s
P95 latency	11.7s

Interpretation: the realistic suite is harsher and more useful for product work. It exposed that DuckDuckGo-only HTML search can return zero results or bad results for short real queries, so SignalRAG now queries Bing and Yahoo HTML fallbacks in parallel and avoids truncating one provider's results before fusion. SignalRAG also applies authority-aware query rewrites, trust-aware pre-fetch reranking, and a small high-confidence official source router for navigational documentation queries. The source router now covers vertical authority sources such as Git, PostgreSQL, AWS, IETF/OAuth, W3C, OWASP, FTC, CFPB, SEC, NIH, NASA, NOAA, and Perplexity docs. This lifted expected-source recall from 0.60 to 0.9733. The answer layer now prioritizes primary/official evidence in the final answer context and conservatively augments citations when a primary source directly supports a cited claim. That lifted used-source recall from 0.32 to 0.9467 and supported-claim rate from 0.7230 to 0.8656 on the realistic suite.

Smart Cache

SignalRAG uses three cache layers:

Page cache: fetched web pages are persisted in SQLite for reuse during retrieval and reranking.
Planner cache: query plans are cached by normalized query and mode, so repeat requests do not call the lightweight planner model again.
Response cache: final API responses are persisted in SQLite and reused for exact or safe fuzzy matches.

The response cache is intentionally conservative but high-hit:

Canonical exact hits ignore case, punctuation, whitespace, and domain-list ordering.
Safe fuzzy hits allow light rewording such as singular/plural variants and reordered wording, but only when mode, lens, filters, locale, verifier, intent tags, and numeric tokens still match.
Fresh queries, day/week recency, and queries containing current/latest/today wording do not use fuzzy cache matches.
Cached responses return meta.cache_hit, meta.cache_strategy, meta.cache_similarity, meta.cache_age_seconds, and meta.cache_source_query.

Design Notes

Accuracy comes from grounding every answer in retrieved passages, checking retrieval quality before generation, fusing multi-query search results with RRF, ranking with contextual BM25 signals, packing answer context with query-aware compression, inheriting paragraph-level citations during claim checks, and returning only used citations by default. Speed comes from short timeouts, request concurrency, page caching, persistent smart response caching, planner caching, adaptive citation judging, early reranking, compression before generation, and the planner choosing the cheapest mode that fits the query. For production, use a paid search API such as Brave or Tavily, add an embedding or cross-encoder reranker, and persist traces for evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
benchmark_results		benchmark_results
extensions		extensions
fast_rag		fast_rag
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SignalRAG

Quick Start

Chromium Integration

Optional Accuracy Upgrades

API

Source Trust Tiers

Further Improvement Roadmap

Example Runs

Fast official API lookup

ChatGPT Search citation behavior

API docs with source controls

Deep Research trace

Recall Evaluation

End-to-End Benchmark

Smart Cache

Design Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SignalRAG

Quick Start

Chromium Integration

Optional Accuracy Upgrades

API

Source Trust Tiers

Further Improvement Roadmap

Example Runs

Fast official API lookup

ChatGPT Search citation behavior

API docs with source controls

Deep Research trace

Recall Evaluation

End-to-End Benchmark

Smart Cache

Design Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages