Whitepaper v1.0
Author: Santosh Chavala
Date: December 2024
SanTOK Cognitive is a knowledge management and reasoning system that operates without neural networks, transformers, or external AI. In an era dominated by Large Language Models (LLMs), SanTOK Cognitive takes a fundamentally different approach: pure symbolic reasoning with template-based verbalization. This paper presents the architecture, algorithms, and philosophy behind SanTOK Cognitive, demonstrating that structured knowledge + symbolic inference can provide explainable, deterministic, and efficient intelligence.
Large Language Models have revolutionized AI, but they come with significant drawbacks:
| Issue | Description |
|---|---|
| Black Box | Cannot explain why a specific answer was generated |
| Hallucination | Generate plausible but factually incorrect information |
| Resource Intensive | Require significant compute (GPU, memory, energy) |
| Non-Deterministic | Same input can produce different outputs |
| External Dependency | Require API access or large model downloads |
| Privacy Concerns | Data may be processed by third parties |
Structured knowledge + symbolic inference can match or exceed LLM performance for many knowledge-based tasks, while providing complete explainability, determinism, and local operation.
SanTOK Cognitive proves this thesis by implementing:
- Knowledge graphs for relational knowledge
- Trees for hierarchical organization
- Rule-based inference for logical reasoning
- Template-based verbalization for natural language output
- A complete cognitive architecture without neural components
- Custom algorithms (SanTOKRanker, SanTOK9Scorer, etc.)
- Formal invariants ensuring system correctness
- Integration framework with existing SanTOK tokenization
┌─────────────────────────────────────────────────────────────┐
│ DESIGN PRINCIPLES │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. EXPLAINABILITY FIRST │
│ Every output traces to source facts and rules │
│ │
│ 2. NO EXTERNAL AI │
│ Zero dependency on GPT, BERT, or any neural system │
│ │
│ 3. DETERMINISM │
│ Same input → Same output (except explicit randomness) │
│ │
│ 4. LOCAL OPERATION │
│ All processing happens locally, no API calls │
│ │
│ 5. MODULARITY │
│ Each component works independently │
│ │
│ 6. SAFETY │
│ Never modifies existing code (santok_complete) │
│ │
└─────────────────────────────────────────────────────────────┘
SanTOK Cognitive recognizes that knowledge has multiple dimensions:
| Pillar | Structure | Question Answered | Example |
|---|---|---|---|
| Vectors | Flat embedding space | "What is similar?" | Semantic search |
| Graphs | Nodes + Edges | "What is connected?" | Relation reasoning |
| Trees | Hierarchical | "How is it organized?" | Taxonomy navigation |
| Rules | IF-THEN statements | "What follows logically?" | Inference |
Unique to SanTOK, we use digital root mathematics throughout:
Digital Root: dr(n) = 1 + ((n - 1) mod 9)
This creates:
• Bounded scores (always 1-9)
• Cyclic patterns
• Interpretable meanings:
1=Origin, 2=Duality, 3=Synthesis, 4=Structure,
5=Change, 6=Balance, 7=Analysis, 8=Power, 9=Completion
┌─────────────────────────────────────────────────────────────────────────┐
│ SANTOK COGNITIVE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ INPUT │
│ ┌─────────────┐ │
│ │ Text/Query │ │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ KNOWLEDGE LAYER │ │
│ │ │ │
│ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
│ │ │ GRAPH │ │ TREE │ │ MEMORY │ │ │
│ │ │ STORE │ │ STORE │ │ (Unified) │ │ │
│ │ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │ │
│ │ └───────────────┼───────────────┘ │ │
│ └─────────────────────────┼───────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────┼──────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ REASONING LAYER │ │
│ │ │ │
│ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │
│ │ │ INFERENCE │ │ PATH │ │CONTRADICT │ │ │
│ │ │ ENGINE │ │ FINDER │ │ DETECTOR │ │ │
│ │ └───────────┘ └───────────┘ └───────────┘ │ │
│ │ │ │
│ └─────────────────────────┬───────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ VERBALIZATION LAYER │ │
│ │ │ │
│ │ ┌───────────────────────────────────────────────────────────┐ │ │
│ │ │ SANTOK VERBALIZER │ │ │
│ │ │ (Template-based, NO neural LLM) │ │ │
│ │ └───────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────┬───────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ OUTPUT │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Answer + Explanation + Confidence + Reasoning Trace │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
| Module | Responsibility | Key Classes |
|---|---|---|
| graph/ | Store entities and relationships | GraphNode, GraphEdge, GraphStore |
| trees/ | Hierarchical organization | TreeNode, Tree, TreeStore |
| memory/ | Unified knowledge access | MemoryObject, UnifiedMemory |
| reasoning/ | Inference and explanation | InferenceEngine, RuleBase, Explainer |
| algorithms/ | Custom SanTOK algorithms | SanTOKRanker, SanTOK9Scorer, etc. |
Unlike BM25 or TF-IDF, SanTOKRanker combines multiple signals:
score(q, d) = α·Relevance + β·Connectivity + γ·Hierarchy + δ·Freshness
Where:
Relevance = Token overlap with position weighting
Connectivity = Graph centrality × relation strength
Hierarchy = Tree depth weight × parent inheritance
Freshness = Temporal decay × access frequency
Default weights: α=0.4, β=0.3, γ=0.2, δ=0.1
Advantages over BM25:
- Incorporates graph structure
- Uses tree hierarchy
- Bounded by 9-centric transformation
The inference engine applies 20+ symbolic rules:
TRANSITIVITY RULES:
A IS_A B, B IS_A C → A IS_A C (conf × 0.95)
INVERSE RULES:
A HAS_PART B → B PART_OF A (conf × 1.0)
INHERITANCE RULES:
A IS_A B, B HAS_PART C → A HAS_PART C (conf × 0.80)
SYMMETRY RULES:
A SIMILAR_TO B → B SIMILAR_TO A (conf × 1.0)
Termination Guarantee:
- Maximum iteration limit (default: 100)
- Minimum confidence threshold (default: 0.3)
- Fixpoint detection (no new facts)
- Duplicate prevention
Instead of neural NER, we use 34 regex patterns:
"(\w+) is a (\w+)" → IS_A relation
"(\w+) is part of (\w+)" → PART_OF relation
"(\w+) causes (\w+)" → CAUSES relation
"(\w+) depends on (\w+)" → DEPENDS_ON relation
Advantages:
- No training data required
- Rules are transparent and editable
- Works in any domain without fine-tuning
Instead of neural text generation:
Query Type: DEFINITION
Template: "{subject} is {definition}."
Query Type: RELATION
Template: "{subject} is {relation} {object}."
Query Type: PROCESS
Template: "{subject} works by {mechanism}."
Relation Phrases:
IS_A → "is a type of"
PART_OF → "is part of"
CAUSES → "causes"
USES → "uses"
| Feature | LLM (e.g., GPT-4) | SanTOK Cognitive |
|---|---|---|
| Explainability | ❌ Black box | ✅ Full trace |
| Determinism | ❌ Stochastic | ✅ Deterministic |
| Hallucination | ❌ Common | ✅ Impossible* |
| Local Operation | ✅ Always | |
| Resource Usage | ❌ High (GPU) | ✅ Low (CPU) |
| Privacy | ✅ Guaranteed | |
| Open-Domain | ✅ Any topic | |
| Fluency | ✅ Excellent |
*Cannot produce facts not in knowledge base
Use SanTOK Cognitive when:
- Explainability is required
- Knowledge base is well-defined
- Determinism is essential
- Resources are limited
- Privacy is critical
Use LLM when:
- Open-domain conversation
- Creative text generation
- Handling ambiguous queries
- No structured knowledge base
SanTOK Cognitive can enhance LLMs:
┌─────────────────┐
│ Query │
└────────┬────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ SANTOK COGNITIVE │
│ • Retrieve relevant facts │
│ • Infer new relations │
│ • Check contradictions │
│ • Build structured context │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────┐
│ Structured Context │
│ + Reasoning Paths │
└──────────┬──────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LLM │
│ "Speaker, not Thinker" │
│ • Convert structured output to fluent text │
│ • LLM cannot hallucinate (constrained by context) │
└─────────────────────────────────────────────────────────────┘
SanTOK Cognitive maintains 32 formal invariants:
Graph Invariants:
- G1: Node ID uniqueness
- G5: IS_A acyclicity (DAG)
- G8: SIMILAR_TO/OPPOSITE_OF mutual exclusion
Inference Invariants:
- I1: Bounded confidence [0, 1]
- I2: Monotonic confidence decay
- I3: Guaranteed termination
System Guarantees:
- S1: No external AI dependency
- S2: Complete explainability
- S4: Determinism
| Operation | Time Complexity |
|---|---|
| Node lookup | O(1) |
| Edge addition | O(1) |
| Neighbor query | O(degree) |
| Inference iteration | O(E × R) |
| Pattern matching | O(n × p) |
Language: Python 3.8+
Dependencies: Standard library only
Data Format: JSON / Pickle
Memory: In-memory with serialization
santok_cognitive/
├── graph/ # Knowledge graph (4 files)
├── trees/ # Knowledge trees (3 files)
├── memory/ # Unified memory (2 files)
├── reasoning/ # Reasoning engine (9 files)
├── algorithms/ # Custom algorithms (6 files)
├── integration/ # Bridges (4 files)
└── utils/ # Utilities (3 files)
Total: 35 Python files, 50+ classes
from santok_cognitive import UnifiedMemory, SanTOKReasoner
# Create memory
memory = UnifiedMemory()
# Add knowledge
memory.add("Python is a programming language", "fact", auto_link_graph=True)
memory.add("Programming languages are tools", "fact", auto_link_graph=True)
# Create reasoner
reasoner = SanTOKReasoner(memory)
# Ask question
answer = reasoner.ask("What is Python?")
print(answer.text) # "Python is a type of programming language."
print(answer.confidence) # 0.85
print(answer.explain()) # Full reasoning traceSanTOK Cognitive demonstrates that meaningful AI systems can be built without neural networks. By combining structured knowledge (graphs, trees) with symbolic inference (rules, patterns), we achieve:
- Complete explainability - every output is traceable
- Guaranteed determinism - reproducible results
- Zero hallucination - only facts from knowledge base
- Local operation - no external API dependencies
- Efficient execution - CPU-only, low memory
This is not a replacement for LLMs, but a complement that excels where LLMs struggle: explainability, determinism, and resource efficiency.
SanTOK Cognitive is 100% unique, 100% original, and 100% explainable.
- SanTOK Complete - Core tokenization and embedding system
- Knowledge Graphs - A Survey (2021)
- Symbolic AI vs Neural AI - A Comparison
- Digital Root Mathematics and Applications
See docs/ALGORITHMS_DEEP.md
See docs/REASONING_DEEP.md
See docs/INVARIANTS.md