Skip to content

Latest commit

 

History

History
453 lines (345 loc) · 19.6 KB

File metadata and controls

453 lines (345 loc) · 19.6 KB

SanTOK Cognitive: A Pure Symbolic Knowledge System

Whitepaper v1.0

Author: Santosh Chavala
Date: December 2024


Abstract

SanTOK Cognitive is a knowledge management and reasoning system that operates without neural networks, transformers, or external AI. In an era dominated by Large Language Models (LLMs), SanTOK Cognitive takes a fundamentally different approach: pure symbolic reasoning with template-based verbalization. This paper presents the architecture, algorithms, and philosophy behind SanTOK Cognitive, demonstrating that structured knowledge + symbolic inference can provide explainable, deterministic, and efficient intelligence.


1. Introduction

1.1 The Problem with LLM-Based Systems

Large Language Models have revolutionized AI, but they come with significant drawbacks:

Issue Description
Black Box Cannot explain why a specific answer was generated
Hallucination Generate plausible but factually incorrect information
Resource Intensive Require significant compute (GPU, memory, energy)
Non-Deterministic Same input can produce different outputs
External Dependency Require API access or large model downloads
Privacy Concerns Data may be processed by third parties

1.2 Our Thesis

Structured knowledge + symbolic inference can match or exceed LLM performance for many knowledge-based tasks, while providing complete explainability, determinism, and local operation.

SanTOK Cognitive proves this thesis by implementing:

  • Knowledge graphs for relational knowledge
  • Trees for hierarchical organization
  • Rule-based inference for logical reasoning
  • Template-based verbalization for natural language output

1.3 Key Contributions

  1. A complete cognitive architecture without neural components
  2. Custom algorithms (SanTOKRanker, SanTOK9Scorer, etc.)
  3. Formal invariants ensuring system correctness
  4. Integration framework with existing SanTOK tokenization

2. Design Philosophy

2.1 Core Principles

┌─────────────────────────────────────────────────────────────┐
│                    DESIGN PRINCIPLES                         │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  1. EXPLAINABILITY FIRST                                    │
│     Every output traces to source facts and rules          │
│                                                             │
│  2. NO EXTERNAL AI                                          │
│     Zero dependency on GPT, BERT, or any neural system     │
│                                                             │
│  3. DETERMINISM                                             │
│     Same input → Same output (except explicit randomness)  │
│                                                             │
│  4. LOCAL OPERATION                                         │
│     All processing happens locally, no API calls           │
│                                                             │
│  5. MODULARITY                                              │
│     Each component works independently                     │
│                                                             │
│  6. SAFETY                                                  │
│     Never modifies existing code (santok_complete)         │
│                                                             │
└─────────────────────────────────────────────────────────────┘

2.2 The Four Pillars of Knowledge

SanTOK Cognitive recognizes that knowledge has multiple dimensions:

Pillar Structure Question Answered Example
Vectors Flat embedding space "What is similar?" Semantic search
Graphs Nodes + Edges "What is connected?" Relation reasoning
Trees Hierarchical "How is it organized?" Taxonomy navigation
Rules IF-THEN statements "What follows logically?" Inference

2.3 The 9-Centric Philosophy

Unique to SanTOK, we use digital root mathematics throughout:

Digital Root: dr(n) = 1 + ((n - 1) mod 9)

This creates:
• Bounded scores (always 1-9)
• Cyclic patterns
• Interpretable meanings:
  1=Origin, 2=Duality, 3=Synthesis, 4=Structure,
  5=Change, 6=Balance, 7=Analysis, 8=Power, 9=Completion

3. Architecture

3.1 System Overview

┌─────────────────────────────────────────────────────────────────────────┐
│                        SANTOK COGNITIVE                                  │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│   INPUT                                                                 │
│   ┌─────────────┐                                                      │
│   │ Text/Query  │                                                      │
│   └──────┬──────┘                                                      │
│          │                                                              │
│          ▼                                                              │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │                    KNOWLEDGE LAYER                               │  │
│   │                                                                  │  │
│   │   ┌───────────┐   ┌───────────┐   ┌───────────┐                │  │
│   │   │  GRAPH    │   │   TREE    │   │  MEMORY   │                │  │
│   │   │  STORE    │   │   STORE   │   │ (Unified) │                │  │
│   │   └─────┬─────┘   └─────┬─────┘   └─────┬─────┘                │  │
│   │         └───────────────┼───────────────┘                       │  │
│   └─────────────────────────┼───────────────────────────────────────┘  │
│                             │                                          │
│          ┌──────────────────┼──────────────────┐                       │
│          │                  │                  │                       │
│          ▼                  ▼                  ▼                       │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │                   REASONING LAYER                                │  │
│   │                                                                  │  │
│   │   ┌───────────┐   ┌───────────┐   ┌───────────┐                │  │
│   │   │ INFERENCE │   │   PATH    │   │CONTRADICT │                │  │
│   │   │  ENGINE   │   │  FINDER   │   │ DETECTOR  │                │  │
│   │   └───────────┘   └───────────┘   └───────────┘                │  │
│   │                                                                  │  │
│   └─────────────────────────┬───────────────────────────────────────┘  │
│                             │                                          │
│                             ▼                                          │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │                  VERBALIZATION LAYER                             │  │
│   │                                                                  │  │
│   │   ┌───────────────────────────────────────────────────────────┐ │  │
│   │   │              SANTOK VERBALIZER                            │ │  │
│   │   │          (Template-based, NO neural LLM)                  │ │  │
│   │   └───────────────────────────────────────────────────────────┘ │  │
│   │                                                                  │  │
│   └─────────────────────────┬───────────────────────────────────────┘  │
│                             │                                          │
│                             ▼                                          │
│   OUTPUT                                                               │
│   ┌─────────────────────────────────────────────────────────────────┐  │
│   │  Answer + Explanation + Confidence + Reasoning Trace            │  │
│   └─────────────────────────────────────────────────────────────────┘  │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

3.2 Module Responsibilities

Module Responsibility Key Classes
graph/ Store entities and relationships GraphNode, GraphEdge, GraphStore
trees/ Hierarchical organization TreeNode, Tree, TreeStore
memory/ Unified knowledge access MemoryObject, UnifiedMemory
reasoning/ Inference and explanation InferenceEngine, RuleBase, Explainer
algorithms/ Custom SanTOK algorithms SanTOKRanker, SanTOK9Scorer, etc.

4. Core Algorithms

4.1 SanTOK Ranking Algorithm

Unlike BM25 or TF-IDF, SanTOKRanker combines multiple signals:

score(q, d) = α·Relevance + β·Connectivity + γ·Hierarchy + δ·Freshness

Where:
  Relevance    = Token overlap with position weighting
  Connectivity = Graph centrality × relation strength
  Hierarchy    = Tree depth weight × parent inheritance
  Freshness    = Temporal decay × access frequency

Default weights: α=0.4, β=0.3, γ=0.2, δ=0.1

Advantages over BM25:

  • Incorporates graph structure
  • Uses tree hierarchy
  • Bounded by 9-centric transformation

4.2 Inference Engine

The inference engine applies 20+ symbolic rules:

TRANSITIVITY RULES:
  A IS_A B, B IS_A C → A IS_A C (conf × 0.95)

INVERSE RULES:
  A HAS_PART B → B PART_OF A (conf × 1.0)

INHERITANCE RULES:
  A IS_A B, B HAS_PART C → A HAS_PART C (conf × 0.80)

SYMMETRY RULES:
  A SIMILAR_TO B → B SIMILAR_TO A (conf × 1.0)

Termination Guarantee:

  1. Maximum iteration limit (default: 100)
  2. Minimum confidence threshold (default: 0.3)
  3. Fixpoint detection (no new facts)
  4. Duplicate prevention

4.3 Pattern-Based Relation Extraction

Instead of neural NER, we use 34 regex patterns:

"(\w+) is a (\w+)"           → IS_A relation
"(\w+) is part of (\w+)"     → PART_OF relation
"(\w+) causes (\w+)"         → CAUSES relation
"(\w+) depends on (\w+)"     → DEPENDS_ON relation

Advantages:

  • No training data required
  • Rules are transparent and editable
  • Works in any domain without fine-tuning

4.4 Template-Based Verbalization

Instead of neural text generation:

Query Type: DEFINITION
Template: "{subject} is {definition}."

Query Type: RELATION
Template: "{subject} is {relation} {object}."

Query Type: PROCESS
Template: "{subject} works by {mechanism}."

Relation Phrases:

IS_A      → "is a type of"
PART_OF   → "is part of"
CAUSES    → "causes"
USES      → "uses"

5. Comparison with LLM-Based Systems

5.1 Feature Comparison

Feature LLM (e.g., GPT-4) SanTOK Cognitive
Explainability ❌ Black box ✅ Full trace
Determinism ❌ Stochastic ✅ Deterministic
Hallucination ❌ Common ✅ Impossible*
Local Operation ⚠️ Possible ✅ Always
Resource Usage ❌ High (GPU) ✅ Low (CPU)
Privacy ⚠️ Depends ✅ Guaranteed
Open-Domain ✅ Any topic ⚠️ Knowledge base
Fluency ✅ Excellent ⚠️ Template-based

*Cannot produce facts not in knowledge base

5.2 When to Use Each

Use SanTOK Cognitive when:

  • Explainability is required
  • Knowledge base is well-defined
  • Determinism is essential
  • Resources are limited
  • Privacy is critical

Use LLM when:

  • Open-domain conversation
  • Creative text generation
  • Handling ambiguous queries
  • No structured knowledge base

5.3 Hybrid Approach

SanTOK Cognitive can enhance LLMs:

                 ┌─────────────────┐
                 │     Query       │
                 └────────┬────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────────┐
│                  SANTOK COGNITIVE                            │
│  • Retrieve relevant facts                                  │
│  • Infer new relations                                      │
│  • Check contradictions                                     │
│  • Build structured context                                 │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
              ┌─────────────────────┐
              │  Structured Context │
              │  + Reasoning Paths  │
              └──────────┬──────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│                        LLM                                   │
│           "Speaker, not Thinker"                            │
│  • Convert structured output to fluent text                 │
│  • LLM cannot hallucinate (constrained by context)         │
└─────────────────────────────────────────────────────────────┘

6. Formal Properties

6.1 Key Invariants

SanTOK Cognitive maintains 32 formal invariants:

Graph Invariants:

  • G1: Node ID uniqueness
  • G5: IS_A acyclicity (DAG)
  • G8: SIMILAR_TO/OPPOSITE_OF mutual exclusion

Inference Invariants:

  • I1: Bounded confidence [0, 1]
  • I2: Monotonic confidence decay
  • I3: Guaranteed termination

System Guarantees:

  • S1: No external AI dependency
  • S2: Complete explainability
  • S4: Determinism

6.2 Complexity Guarantees

Operation Time Complexity
Node lookup O(1)
Edge addition O(1)
Neighbor query O(degree)
Inference iteration O(E × R)
Pattern matching O(n × p)

7. Implementation

7.1 Technology Stack

Language:     Python 3.8+
Dependencies: Standard library only
Data Format:  JSON / Pickle
Memory:       In-memory with serialization

7.2 Module Structure

santok_cognitive/
├── graph/           # Knowledge graph (4 files)
├── trees/           # Knowledge trees (3 files)
├── memory/          # Unified memory (2 files)
├── reasoning/       # Reasoning engine (9 files)
├── algorithms/      # Custom algorithms (6 files)
├── integration/     # Bridges (4 files)
└── utils/           # Utilities (3 files)

Total: 35 Python files, 50+ classes

7.3 Usage Example

from santok_cognitive import UnifiedMemory, SanTOKReasoner

# Create memory
memory = UnifiedMemory()

# Add knowledge
memory.add("Python is a programming language", "fact", auto_link_graph=True)
memory.add("Programming languages are tools", "fact", auto_link_graph=True)

# Create reasoner
reasoner = SanTOKReasoner(memory)

# Ask question
answer = reasoner.ask("What is Python?")

print(answer.text)        # "Python is a type of programming language."
print(answer.confidence)  # 0.85
print(answer.explain())   # Full reasoning trace

8. Conclusion

SanTOK Cognitive demonstrates that meaningful AI systems can be built without neural networks. By combining structured knowledge (graphs, trees) with symbolic inference (rules, patterns), we achieve:

  1. Complete explainability - every output is traceable
  2. Guaranteed determinism - reproducible results
  3. Zero hallucination - only facts from knowledge base
  4. Local operation - no external API dependencies
  5. Efficient execution - CPU-only, low memory

This is not a replacement for LLMs, but a complement that excels where LLMs struggle: explainability, determinism, and resource efficiency.

SanTOK Cognitive is 100% unique, 100% original, and 100% explainable.


References

  1. SanTOK Complete - Core tokenization and embedding system
  2. Knowledge Graphs - A Survey (2021)
  3. Symbolic AI vs Neural AI - A Comparison
  4. Digital Root Mathematics and Applications

Appendix A: Complete Algorithm Formulas

See docs/ALGORITHMS_DEEP.md

Appendix B: All Inference Rules

See docs/REASONING_DEEP.md

Appendix C: Formal Invariants

See docs/INVARIANTS.md