Memory Path Engine is intentionally small in v0. The goal is to validate the memory model before scaling infrastructure.
This document focuses on the current implementation architecture. For the higher-level design intent, principles, and roadmap, see vision.md.
This file answers:
- what core objects exist in the current implementation
- how retrieval is assembled today
- which extension points are already exposed
This file does not try to fully explain the long-term motivation for the project. That belongs in vision.md.
A typed unit of memory content.
Suggested first-class fields:
idtypecontentattributesimportancerisknoveltyconfidenceusage_countdecay_factorsource_ref
A typed relationship between two nodes.
Suggested first-class fields:
from_idto_idedge_typeweightconfidencebidirectionalsource_ref
A stable pointer back to source material.
It should support:
- file path
- section or clause identifier
- optional character span or line span
A replayable explanation of how retrieval moved from query to evidence.
Minimum fields:
querystepssupporting_evidencefinal_answerfinal_score
flowchart TD
UserQuery[UserQuery] --> CandidateSearch[CandidateSearch]
MemoryStore[MemoryStore] --> CandidateSearch
CandidateSearch --> ScoreNodes[ScoreNodes]
ScoreNodes --> ExpandNeighbors[ExpandNeighbors]
ExpandNeighbors --> BuildPaths[BuildPaths]
BuildPaths --> RankPaths[RankPaths]
RankPaths --> FinalResult[FinalResult]
The v0 retrieval stack is now split into explicit extension points:
EmbeddingProvider: produces query and node embeddingsEmbeddingTopKRetriever: runs semantic candidate generationScoringStrategy: converts semantic hits plus memory weights into ranked path stepsWeightedGraphRetriever: combines candidate search, neighbor expansion, and path replay
The first scoring function is intentionally simple:
final_score = semantic_score * semantic_weight
+ structural_score * structural_weight
+ anomaly_score * anomaly_weight
+ importance_score * importance_weight
Where:
semantic_scoreis provided by the active embedding backendstructural_scorerewards traversable supporting edgesanomaly_scorerewards nodes marked as risky, conflicting, unusual, or exception-bearingimportance_scorerewards nodes that matter more even if they are not lexically dominant
The core should stay domain-agnostic. Domain packs should provide:
- ingestion conventions
- node typing rules
- edge typing rules
- weight heuristics
- evaluation tasks
In the current codebase, this starts with a small DomainPack abstraction and a registry-backed example pack for contract-like benchmark documents. The intent is to let future packs supply their own ingestion and graph-building logic without rewriting the retrieval core.
Current example packs:
example_contract_pack(withcontract_packkept as a backward-compatible alias)example_runbook_pack
Future candidates:
code_packresearch_packsupport_pack
The repository starts with these conceptual modes:
lexical_baselinePlain lexical retrieval without structure.embedding_baselineEmbedding-based retrieval without graph expansion.structure_onlyRetrieval with node and edge awareness, but no extra weighting.weighted_graphRetrieval with structure, weighting, and replayable paths.activation_spreading_v1Seed selection plus explicit activation propagation along edges with decay and thresholds.
v0 uses an in-memory store so iteration stays fast.
Later storage backends can include:
- sqlite
- graph database
- vector store
- hybrid graph plus vector backends
The repository is beginning to separate responsibilities into clearer bounded contexts:
memory core: node, edge, path, retrieval, and domain-pack abstractionsstructured benchmark: strongly typed benchmark datasets, fixture loading, and evaluation services
This split is meant to support DDD-style evolution: domain concepts stay explicit, and application services orchestrate them without collapsing everything into utility modules.
v1 adds an explicit palace domain under src/memory_engine/memory/:
- Domain:
MemoryPalace,PalaceSpace, typed memories (EpisodicMemory,SemanticMemory,RouteMemory),MemoryLink,DomainMemoryState, andMemoryStateMachine. - Bridge:
palace_to_store/store_to_palacemap between v1 objects and the existingMemoryStore+MemoryNode/MemoryEdgeso all current retrievers keep working. - Recall layering:
PalaceRecallResultholdsretrieved_memories,routes, andactivation_snapshot, derived from a legacyRetrievalResultviaRetrievalResult.palace_result(filled by retrievers inretrieve.py). Public benchmarks prefer this list when ranking session-like items. - Retriever construction:
build_legacy_retrieverinretrieval_factory.pyis shared by the benchmark service andRetrieveMemoryService, avoiding import cycles with the runner. - Dynamic lifecycle on nodes:
MemoryStatePolicystill mutatesMemoryWeight, and also writeslifecycle_state,reinforcement_count, andstability_scoreonMemoryNode.attributesusing the v1 state machine.
Legacy contracts (MemoryPath, RetrievalResult.paths, structured benchmark reports) remain stable; v1 is additive until callers migrate to palace-first APIs.