Problem
When embedding_device=hash / lexical fallback is configured, search can return low-signal vector hits instead of obvious lexical matches.
Observed query:
mempalace reindex database hnsw embeddings
The MCP/vector path returned unrelated entries such as stopword/i18n files and NUL-byte-like documents, while SQLite FTS/BM25 found relevant HNSW/reindex rows immediately.
Root Cause
Hash embeddings are a local lexical fallback, but the search path still treats them like dense semantic embeddings. That makes vector distance a weak ranking signal for some real queries. The SQLite FTS fallback also selected candidates without ORDER BY rank, so low-rowid partial matches could crowd out better candidates before BM25 reranking.
Expected Behavior
When hash/lexical embedding mode is active, search should use the SQLite BM25 path intentionally. FTS candidate selection should retrieve relevant candidates first before local BM25 reranking.
Proposed Fix
- Route MCP and CLI search to SQLite BM25 when
embedding_device=hash or lexical
- Add
ORDER BY rank to the SQLite FTS candidate query
- Preserve the existing vector path for normal ONNX/dense embedding mode
- Surface the fallback reason in structured search results
Problem
When
embedding_device=hash/ lexical fallback is configured, search can return low-signal vector hits instead of obvious lexical matches.Observed query:
The MCP/vector path returned unrelated entries such as stopword/i18n files and NUL-byte-like documents, while SQLite FTS/BM25 found relevant HNSW/reindex rows immediately.
Root Cause
Hash embeddings are a local lexical fallback, but the search path still treats them like dense semantic embeddings. That makes vector distance a weak ranking signal for some real queries. The SQLite FTS fallback also selected candidates without
ORDER BY rank, so low-rowid partial matches could crowd out better candidates before BM25 reranking.Expected Behavior
When hash/lexical embedding mode is active, search should use the SQLite BM25 path intentionally. FTS candidate selection should retrieve relevant candidates first before local BM25 reranking.
Proposed Fix
embedding_device=hashorlexicalORDER BY rankto the SQLite FTS candidate query