You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All benchmarks from a production system running on Windows with RTX 5090 (embedding server) + TerpHQ gateway. Your numbers will vary based on hardware, vault size, and model choice.
Memory Search Performance
Metric
nomic-embed-text (768-dim)
Qwen3-VL-Embedding-8B (4096-dim)
Dimensions
768
4096
MMEB Rank
~Top 20
#1
Search latency
~45ms
~45ms (L1 cache hit) / ~70ms (cold)
Index size (our system)
73-180 chunks per agent
24,700+ chunks per agent
Files indexed
5-16 per agent
947-962 per agent
Cache entries
59-293
6,682-41,617
Multimodal
No
Yes (text + image)
VRAM required
~300MB (Ollama)
~16GB (custom Python server)
Context Optimization
Metric
Stock OpenClaw
After Optimization
Workspace context per msg
15-20 KB
4-5 KB
SOUL.md
3-5 KB
< 1 KB (772 bytes target)
MEMORY.md
5-15 KB
< 3 KB (pointers only)
AGENTS.md
3-8 KB
< 2 KB (decision tree)
TOOLS.md
2-5 KB
< 1 KB (one-liners)
Total injected
15-30 KB
< 8 KB
Token cost per msg (Opus)
~$0.25 (50K tokens)
~$0.025 (5K tokens)
Response time
4-8 sec
1-3 sec
Memory Bridge Impact
Metric
Without Bridge
With Bridge
Vault chunks available to Codex
0
8-18 per task
Past mistake repetition
Frequent
Rare
Architecture alignment
Guesses
Follows established patterns
Iteration rounds needed
3-5
1-2
First-attempt quality
~60%
~85%
Self-Improving System Compounding
Week
Corrections Logged
Mistakes Avoided
HOT.md Entries
Promoted to AGENTS.md
1
~5
0
0
0
2
~12
~3
~3
0
4
~25
~10
~8
~2
8
~40
~25
~12
~5
12
~50
~35+
~15
~8
(Projected based on daily usage patterns. Actual numbers will vary.)