LLM, VLM, Snake, SAT, Charles, Moncey, Architect, Json, Concierge, Matching, Calc, Diff, Synthax, Google, Computation, ML — plus Extraction + Outlook for memory-augmented document workflows, and MonceOS for brick-ready composition. One SDK, zero config — chat, classify, and compute with no API key.
from monceai import Charles, Matching, Calc, Extraction, Outlook, Synthax
Charles("6x7") # → "42" (boolean arithmetic)
Calc("123x3456") # → "425088" (exact Decimal)
Matching("LGB Menuiserie", factory_id=4) # → client #60689 (89% conf)
Matching("44.2 rTherm", factory_id=4) # → article #63442 (100% conf)
# v1.2.4 — deep reasoning flagship, $12/query budget
s = Synthax("design auth for a glass factory portal")
str(s) # TL;DR (≤ 3 sentences, Haiku-compacted)
s.answer # exhaustive Sonnet synthesis
s.job.stages # recall → plan → draft → adversary → revise → arbiter → notify
# v1.2.0 — memory-augmented extraction
ex = Extraction("quote.pdf", user_id="7a3f9b2c", auto_memory=True)
ex.lines # structured rows
ex.trust # {"score": 98, "routing": "AUTO_APPROVE"}
ex.insights # Haiku-distilled bullets written back to memory
ol = Outlook(user_id="7a3f9b2c", auto_memory=True)
ol.extract_email(attachments=[pdf_bytes], subject="Devis VIP", body="comme d'hab")
ol.recall("VIP cloisonneur patterns")Matching(text) auto-routes client vs article in one parallel call.
No need to specify field= — the server races both paths and returns
the higher-confidence match. Extraction / Outlook ship the full
reflex loop: recall prior memories → extract → distill insights → remember.
Charles Dana · Monce SAS · April 2026 · Paper
pip install git+https://github.com/Monce-AI/monceai-sdk.git@v2.3.0
python quickstart.py # interactive menu — pick any featurequickstart.py ships at the repo root and runs live demos for every
shipped feature. Each demo prints the exact code you'd write next to the
output you'd see — copy-paste straight into your project.
python quickstart.py # full menu
python quickstart.py charles # jump to one feature
python quickstart.py calc # works offline, zero network
python quickstart.py snakeaudit # audit-driven feature ranking demo
python quickstart.py all # back-to-back tour, every demoAvailable demos:
| feature | what it shows | network |
|---|---|---|
charles |
smart router, 4 engines race | yes |
json |
structured dict-subclass output | yes |
matching |
factory-driven canonical IDs | yes |
calc |
exact NP arithmetic, 0 tokens | no |
diff |
raw vs monceai-enhanced | yes |
document |
drop a file, ask a question | yes |
session |
persistent chat with memory | yes |
architect |
ASCII diagrams on demand | yes |
snake |
SAT classifier (needs SNAKE_API_KEY) |
yes |
snakeaudit |
audit-driven feature ranking | no |
sat |
CNF solver (needs SAT_API_KEY) |
yes |
Most demos are key-free. Snake and SAT print a clean "skipped — set
the key first" instead of erroring out, so you can run the full tour
even without API keys.
pip install git+https://github.com/Monce-AI/monceai-sdk.gitZero dependencies beyond requests. No API key needed for LLM/VLM/Charles.
| Module | Auth | Backend | Cost |
|---|---|---|---|
LLM() |
None | monceapp.aws.monce.ai | Free |
VLM() |
None | monceapp.aws.monce.ai | Free |
Charles() / Moncey() |
None | monceapp.aws.monce.ai | Free |
Json() / Concierge() |
None | monceapp / concierge.aws.monce.ai | Free |
Matching() / Calc() / Diff() |
None | monceapp.aws.monce.ai | Free |
Extraction() / Outlook() |
user_id only | selfservice.aws.monce.ai | Free |
Snake() |
None | snakebatch.aws.monce.ai | Free |
SAT() |
SAT_API_KEY |
npdollars.aws.monce.ai | Per-invocation |
from monceai import LLM
r = LLM("6x7") # default: charles-science
r = LLM("factor 10403", model="charles-auma") # boolean maximization
r = LLM("what is pi?", model="haiku") # fast + cheap
r = LLM("morning bruv", model="charles") # full charles pipeline
r.text # "42"
r.json # parsed dict (when charles-json)
r.ok # True if successful
r.elapsed_ms # wall clock
r.sat_memory # compute receipt (formula, evals, services fired)| Shorthand | Engine | Latency | Cost/msg |
|---|---|---|---|
charles |
4x parallel (mem+csv+cnf+sudoku) → Sonnet | 8-15s | ~$0.01 |
charles-auma |
Haiku encode → AUMA {0,1}^n → Haiku | 3-8s | ~$0.003 |
charles-science |
Snake router → 7 services → Sonnet | 15-60s | ~$0.01 |
charles-json |
Memory → Sonnet strict JSON, VLM | 5-15s | ~$0.01 |
charles-architect |
Memory → Sonnet ASCII diagrams | 5-15s | ~$0.01 |
concise |
charles → Haiku TL;DR | 10-20s | ~$0.01 |
cc |
charles ∥ concise → synthesis | 12-25s | ~$0.02 |
moncey |
Glass sales agent (snake.aws + moncesuite) | 2-5s | ~$0.003 |
concierge |
Knowledge base + Snake tools | 3-10s | ~$0.005 |
General-purpose extraction + factory-driven matching, optionally augmented with charles or concierge memory.
| Shorthand | What it does | Memory |
|---|---|---|
monolith |
Bedrock Sonnet + factory context (extract/describe) | — |
matching |
Client ∥ article race, picks higher confidence | — |
charles-monolith |
monolith + charles.aws memory prefix | charles |
charles-matching |
matching + Haiku re-arbitration on memory | charles |
concierge-monolith |
monolith + concierge.aws search results | concierge |
concierge-matching |
matching + Haiku re-arbitration on memory | concierge |
Real benchmark (prompt: "44.2 rTherm", factory 4):
matching: 1.7s, snake_sat, 50% confidencecharles-matching: 10s, 1185c memory, 95% via memory_arbitrationconcierge-matching: 4.3s, 1874c memory, 95% via memory_arbitration
Memory arbitration fires automatically when the primary match confidence is below 0.85 and a memory prefix exists. Haiku re-scores candidates using the recalled context.
| Shorthand | Model | Tools | Vision | Latency |
|---|---|---|---|---|
sonnet |
Sonnet 4.6 | ✅ | ✅ | 1-3s |
sonnet4 |
Sonnet 4 | ✅ | ✅ | 2-4s |
haiku |
Haiku 4.5 | ✅ | ✅ | 1-2s |
nova-pro |
Nova Pro | — | ✅ | 0.8s |
nova-lite |
Nova Lite | — | — | 0.7s |
nova-micro |
Nova Micro | — | — | 0.6s |
from monceai import VLM
# Images — multipart to the backend
r = VLM("what is in this image?", image=open("photo.png", "rb").read())
# Unified `file=` — path, Path, bytes, or file-like. Any doctype.
# Binary (.pdf/.png/.docx/...) → multipart.
# Text-like (.txt/.json/.csv/.md/.ndjson/...) → inlined into the prompt,
# so even text-only endpoints can "see" the file.
r = VLM("extract all glass fields", file="quote.pdf")
r = VLM("parse the order", file="order.json")
r = VLM("summarise", file=open("notes.md", "rb"))
r = VLM("what's wrong?", file=pdf_bytes, filename="q.pdf")
r.text # raw response
r.json # parsed dictAll five eyed classes take the same file= argument:
VLM, LLM, Json, Charles, and LLMSession.send.
from monceai import Charles
c = Charles()
# Auto-routes to the best sub-model
c("6x7") # → charles-auma (math)
c("is K4 3-colorable?") # → charles-science (SAT)
c("list 5 primes", strategy="json") # → charles-json
# Explicit sub-model calls
c.math("minimize x^2 - 4x + 4")
c.science("solve this sudoku: 530070000...")
c.json("list the planets")
c.vlm("describe", image=img_bytes)
# Parallel strategy — fire multiple models, take the best
c("explain gravity", strategy="deep") # charles + charles-science in parallelfrom monceai import Moncey
Moncey("44.2 Silence/16 alu gris/4 rFloat JPP")
# → "Bonjour, j'ai identifié: Feuilleté 44.2 + Intercalaire 16mm..."
# Client mode — parallel futures
m = Moncey()
a = m("44.2 feuillete LowE 16mm")
b = m("devis 20 vitrages")
print(a) # blocks on readPipeline: snake.aws/comprendre (deterministic glass decomp) → moncesuite.aws/comprendre (10 classifiers if quality < 75%) → Haiku synthesis. Default factory_id=3 (Monce).
from monceai import Architect
# Blocking — str subclass, IS the diagram
schema = Architect("auth service: users, sessions, api keys")
print(schema) # boxed ASCII ERD
# File in — diagram an existing spec
Architect("diagram this module", file="monceai/llm.py")
# Client mode — reusable, parallel futures
a = Architect()
s1 = a("postgres schema for glass factory orders")
s2 = a("sequence diagram: OAuth2 PKCE flow")
print(s1) # blocks on first read
schema.result.elapsed_ms # LLMResult metadataBacked by charles-architect on monceapp.aws.monce.ai — every response is
a diagram (ERD, class diagram, sequence, flowchart, system architecture).
from monceai import Json
Json("list 5 primes") # → {"primes": [2, 3, 5, 7, 11]}
Json('{"broken: json}') # → fixes it
Json("nom: Charles, age: 26") # → {"nom": "Charles", "age": 26}
# File in — text-like files are inlined, binaries go multipart.
Json("extract the order", file="order.txt")
Json("list the items", file="quote.pdf")
Json("parse this", file=open("items.csv", "rb"))
j = Json("3 colors with hex")
j["colors"] # list access
print(j) # json.dumps(indent=2)A file + a question, one line. Document wraps the charles family
(Charles / Concierge / charles-json) behind a single ergonomic surface.
Pass prompt= at construction and str(doc) is the answer.
from monceai import Document
# One-shot — file + prompt → str via __str__
answer = str(Document("quote.pdf", prompt="what's the intercalaire?"))
# → "44.2 rTherm with 16mm TPS noir, per the VIP cloisonneur profile…"
# Multi-question — instantiate once, ask many times
doc = Document("spec.xlsx")
doc.ask("what's the total number of lines?")
doc.ask("any deadline mentioned?", model="concierge") # memory-backed
doc.extract("list all glass lines as JSON") # → dict via Json
# Concierge mode — binary files are pre-transcribed through charles-json,
# then the extracted text is handed to Concierge for a memory-backed answer.
str(Document("devis_VIP.pdf",
prompt="is this the usual pattern for this client?",
model="concierge"))Accepts paths, pathlib.Path, raw bytes, or any file-like with .read().
Text-like payloads (.txt, .md, .csv, .json…) are inlined into the
prompt; binaries (.pdf, .xlsx, .png, .docx…) go out as multipart.
Playground integration. Drag the red Document chip onto the canvas —
or just drop a file anywhere on the canvas and a Document node is spawned
at the drop point with the file pre-attached. Wire its output into any
downstream Charles / Concierge / Arbiter node.
| Mode | Backend | Best for |
|---|---|---|
model="charles" (default) |
charles-json via Charles auto-routing | Single questions, VLM included |
model="concierge" |
charles-json → Concierge | Memory-backed answers on documents |
model="charles-json" |
charles-json direct | Structured JSON / .extract() workflows |
Lives at monceapp.aws.monce.ai/playground.
from monceai import Concierge
Concierge("what's the accuracy for VIP today?") # ask
Concierge("VIP uses warm edge TPS noir as default") # teach
# Memory management
Concierge.remember("44.2 rTherm is standard for Riou")
Concierge.search("rTherm")
Concierge.forget("old pricing info")
Concierge.digest() # daily digest
Concierge.kpi(days=7, factory_id=4) # KPIsBackend: concierge.aws.monce.ai. Sonnet + memory + Snake tools + email signals.
from monceai import LLMSession
s = LLMSession(model="charles")
r1 = s.send("my name is Charles")
r2 = s.send("what is my name?") # remembers contextConstructor-to-resolution: Matching(arg, ...) blocks and is the result.
Four input forms, one class, no fuzzy. /batch and /batch_client on
snake.aws are the source of truth; snake's own candidate list is re-ranked
locally (CPU only); an optional LLM arbitration band on [0.6, 0.95)
uses monceapp Haiku + concierge in parallel, agreement-gated.
from monceai import Matching
# 1. single article
Matching("44.2 rTherm", factory_id=4, field="verre")
# → {"kind": "article", "num_article": "63442",
# "denomination": "44.2 rTherm", "confidence": 1.0, "method": "snake_exact"}
# 2. array — one /batch call, results in input order
r = Matching(["44.2 rTherm", "4/16/4", "SGG Planitherm"],
factory_id=4, field="verre")
r["stats"] # {n, matched_rate, mean_confidence, by_tier}
r.items_list # [{...}, {...}, {...}] in input order
# 3. client by free text (auto-parses nom / siret / email)
Matching("LGB Menuiserie SAS, SIRET 552 100 554 00025", factory_id=4)
# → {"kind": "client", "numero_client": "9232", "nom": "LGB MENUISERIE",
# "confidence": 0.98, "method": "snake_exact"}
# 4. document → client (pdf / image / docx / eml / msg)
from pathlib import Path
Matching(Path("quote.pdf"), factory_id=4)
# → claude.aws/stage_0 → client_infos + matched client
# 5. auto mode (no field, no kind)
Matching("Riou Group", factory_id=4) # → client
Matching("44.2 rTherm WE noir", factory_id=4) # → article
# Ambiguous? Fires both in parallel, returns higher-confidence winner.
# 6. reusable client — deferred futures, parallel across cores
m = Matching(factory_id=4)
a = m("44.2 rTherm", field="verre")
b = m("LGB")
a["num_article"] # blocks until readyEnable for the ambiguous middle. Below 0.6 = garbage, not rescuable.
At/above 0.95 = seamless passthrough. In between, monceapp Haiku and
concierge.aws/chat vote independently — only agreement mutates the pick.
Matching("Triplevitrage33/2+4+5Trempé", factory_id=4,
use_llm=True, top_k=20)
# → local rerank picks a candidate; if conf ∈ [0.6, 0.95), both arbiters
# vote; agreement promotes the pick to tier 2 with method="llm_arb_agree"pairs = [("44.2 rTherm", "63442"), ("SGG Planitherm", "98219"), ...]
report = Matching.assess(pairs, factory_id=4, field="global",
use_llm=True, top_k=20)
report["hit_top1"] # 0.853 (85.3% top-1)
report["hit_topk"] # top-k recall
report["above_floor_accuracy"] # accuracy on rows ≥ 0.6 confidence
report["by_method"] # per-method breakdown
report["calibration"] # [(lo, hi, n, hit_rate), ...]
report["failures"] # first 200 wrong picks with method/conf342 queries across 57 articles × 6 variant kinds (exact / lower /
extra-token / reorder / OCR / nospace), vs live snake.aws/batch:
| Config | top-1 | Tokens | Wall clock |
|---|---|---|---|
| Hosted cascade (snake→haiku→fuzzy) | 40% | — | — |
| Matching v2, CPU only (no fuzzy, no LLM) | 73.7% | 0 | 4.3s |
| Matching v2, top_k=20 + LLM arbitration | 85.3% | 4,400 | ~50s |
Variant breakdown @ 85%: exact 98%, lower 98%, reorder 95%, extra-token 90%,
OCR 83%, nospace 46%. Calibration: [0.95, 1.0) → 97%, [0.8, 0.95) → 91%.
Benchmark source: bench_matching.py.
Article fields for explicit field=: verre, verre1, verre2, verre3,
intercalaire, intercalaire1, intercalaire2, remplissage, gaz,
faconnage, façonnage_arete, global.
From from monceai import Matching, measured against live snake.aws:
| Call | Wall clock | Tokens |
|---|---|---|
Matching("44.2 rTherm", field="verre", factory_id=4) |
86 ms | 0 |
Matching("ACTIF PVC", factory_id=4) — client path |
41 ms | 0 |
Matching(["44.2 rTherm", "16 TPS noir", "Argon"], f=4) |
69 ms (3 queries) | 0 |
Matching("SGG Planitherm", factory_id=4) — auto client+article race |
78 ms | 0 |
Reusable m = Matching(f=4); m(q1); m(q2) — 2 parallel |
46 ms | 0 |
Same LLM() / LLMSession() API — just point at the named model.
MonceApp wires factory context + memory retrieval automatically.
from monceai import LLM
# General extraction via Sonnet + factory context
LLM("extract fields from this order", model="monolith", factory_id=4,
image=pdf_page_bytes)
# Memory-augmented: charles.aws memory feeds the extraction
LLM("what is rTherm?", model="charles-monolith")
# Memory-augmented: concierge.aws search results feed the extraction
LLM("what is rTherm?", model="concierge-monolith")
# → answer uses remembered corrections + alias mappingsChat-mode access (live on /v1/chat):
/monolith,/matching/charles-monolith,/charles-matching/concierge-monolith,/concierge-matching
from monceai import Calc
Calc("123x3456") # → "425088"
Calc("1000000x1000000") # → "1000000000000"
float(Calc("44.2 * 1000")) # → 44200.0Calc is a str subclass — the instance IS the result. Decimal-backed,
exact. Operators: x * / % + -.
from monceai import Diff
d = Diff("Quel intercalaire pour 44.2 rTherm?", factory_id=4)
d.raw_text # generic model answer (often wrong)
d.enhanced_text # monceai-enhanced answer (factory-correct)
d.context_tokens_added # cost of enhancement
print(d.report()) # formatted side-by-sidePerfect for proving the value of (monceai-) context to stakeholders.
One-shot: file in, structured data + insights + memory out. Backed by
selfservice.aws.monce.ai, which hosts
the full VLM engine and per-user memory store. No key — just pass a
user_id (8-char opaque token).
from monceai import Extraction
ex = Extraction("quote.pdf", user_id="7a3f9b2c")
ex.lines # list[dict] — extracted rows
ex.trust # {"score": 98, "routing": "AUTO_APPROVE"}
ex.client # {"name": "RIOU GLASS", "id": ..., "match": ...}
ex.header # {"document_type": "devis", "language": "fr", ...}
ex.validation # {"issues": [...], "overall_confidence": 0.92}
ex.task_id # for feedback / audit
ex.duration_ms # end-to-end latencyAccepts a path, raw bytes, or a list of paths/bytes for multi-file:
Extraction(pdf_bytes, filename="order.pdf", user_id="7a3f9b2c")
Extraction(["a.pdf", "b.pdf"], user_id="7a3f9b2c")Fires a Haiku pass after extraction, distilling 1-3 short bullets worth
remembering (client patterns, routing quirks, recurring corrections) —
and writes them back as memory entries tagged insight. The next
extraction automatically sees them as prior_memories.
ex = Extraction("quote.pdf", user_id="7a3f9b2c", auto_memory=True,
email_subject="Devis VIP urgent",
email_body="Peux-tu traiter comme d'hab?")
ex.insights # ['VIP cloisonneur orders consistently specify warm edge 16mm', ...]
ex.prior_memories # memories surfaced as context for *this* extractionex.accept(note="looks right")
ex.reject(reason="wrong client")
ex.correct(line=0, was="44.2 rTherm", should_be="44.2 clair")Feedback is stored as tagged memory and shows up in downstream recall.
| Pages | Lines | Trust | Routing | Duration | |
|---|---|---|---|---|---|
| Safran aerospace PO | 1 | 1 | 100 | AUTO_APPROVE | 14.0s |
| ASICA industrial PO | 1 | 6 | 98 | AUTO_APPROVE | 14.7s |
| Gasket International enquiry | 2 | 1 | 100 | AUTO_APPROVE | 12.1s |
Wall-clock for all three in parallel: 30.5s. Auto-memory surfaced the ASICA context on the third extraction mid-burst.
Higher-level wrapper around Extraction for email / Outlook flows. Ships
remember, recall, forget, history, chat, and extract_email.
from monceai import Outlook
ol = Outlook(user_id="7a3f9b2c", auto_memory=True)
# Extract attachments with full email context (subject + body)
ex = ol.extract_email(
attachments=[pdf_bytes, ("invoice.xlsx", xlsx_bytes)],
subject="Devis cloisonneur VIP",
body="Peux-tu me traiter ça comme d'hab?",
)
ex.lines; ex.insights
# Memory ops
ol.remember("client always wants 44.2 rTherm as intercalaire", tags=["VIP"])
ol.recall("VIP cloisonneur patterns") # keyword-scored
ol.forget("outdated note") # substring match
# History and activity
ol.history(limit=10) # past extractions
ol.memories(limit=50, tag="insight") # memory listing
ol.stats() # {memories, extractions, conversations}
# Chat — Sonnet grounded on this user's memory only
reply = ol.chat("What does this user usually route to VIP?")
reply["reply"]; reply["latency_ms"]When auto_memory=True, every extract_email() call chains:
recall(subject)
↓
extract(file, context=body)
↓
distill(result, prior=recall_output) ← Haiku
↓
remember(bullets) ← tagged 'insight'
Toggle at runtime: ol.auto_memory = False. Manual mode still auto-logs
the extraction event (just skips the Haiku distillation).
Outlook is a thin client over selfservice.aws.monce.ai
— the full API is documented at /docs.
Memory is isolated per user_id and mirrored to S3 (versioned) for
permanency.
Patterns we actually ran against the live service while validating v1.2.0.
A runnable version lives at examples/extraction_quickstart.py
— python examples/extraction_quickstart.py path/to/file.pdf.
1. Quality probe — one PDF, full reflex loop
from monceai import Extraction, Matching
ex = Extraction(
"quote.pdf",
user_id="7a3f9b2c",
industry="glass",
email_subject="Devis urgent",
email_body="Peux-tu traiter comme d'hab?",
auto_memory=True,
)
# The shape:
assert isinstance(ex, dict) # pretty-prints JSON
assert ex.task_id and ex.duration_ms > 0
assert isinstance(ex.lines, list)
assert isinstance(ex.trust, dict)
# What was extracted:
print(f"vertical : {ex.result['vertical']}")
print(f"client : {ex.client['name']}")
print(f"trust : {ex.trust['score']} ({ex.trust['routing']})")
print(f"lines : {len(ex.lines)}")
# What came back from the reflex loop:
for bullet in ex.insights:
print(f" insight • {bullet}")
for mem in ex.prior_memories:
print(f" recalled • {mem[:80]}")
# Independently cross-check the client match (matching lives in monceapp):
cross = Matching(ex.client["name"], factory_id=4)
print(f"cross-check: {cross['nom']} #{cross['numero_client']} conf={cross['confidence']}")2. Bulk throughput — parallel extractions with ThreadPoolExecutor
import concurrent.futures as cf
from pathlib import Path
from monceai import Extraction
def run_one(path, idx):
return Extraction(
path,
user_id=f"bulk_{idx:04x}",
email_subject=f"Stress {idx}: {Path(path).name}",
auto_memory=True,
timeout=240,
)
paths = ["a.pdf", "b.pdf", "c.pdf", "d.pdf"] # your files
with cf.ThreadPoolExecutor(max_workers=4) as pool:
futures = [pool.submit(run_one, p, i) for i, p in enumerate(paths)]
for fut in cf.as_completed(futures):
ex = fut.result()
print(f"{ex.filename:<30} trust={ex.trust.get('score')} "
f"routing={ex.trust.get('routing'):<14} {ex.duration_ms}ms")Keep max_workers ≤ server worker count to avoid queueing. Selfservice
currently runs 20 gunicorn workers on t3.medium — client parallelism of 8
is a safe default.
3. Multi-file synthesis — one extraction from N attachments
from monceai import Outlook
ol = Outlook(user_id="7a3f9b2c", auto_memory=True)
# Pass a list of paths OR raw bytes OR (filename, bytes) tuples.
# Selfservice runs the engine per file and merges the result server-side:
# first successful file → header/client, all lines concat'd with
# _source_file tagging, worst routing wins.
ex = ol.extract_email(
attachments=[
"order.pdf",
("quote.pdf", open("quote.pdf", "rb").read()),
],
subject="Batch upload",
body="Two files, one workflow.",
)
print(f"merged from {len({l.get('_source_file') for l in ex.lines})} files")
print(f"total lines: {len(ex.lines)}")
print(f"worst routing: {ex.trust['routing']}")4. Memory reflex — sequential calls compound context
from monceai import Outlook
ol = Outlook(user_id="ops_team_01", auto_memory=True)
for path in sorted_pdfs: # e.g. a day's incoming email attachments
ex = ol.extract_email(attachments=[path], subject=path.name)
# `ex.prior_memories` grows with each call — the server auto-recalls
# relevant history BEFORE each extraction and Haiku cross-references
# it in the insights it writes back.
if ex.prior_memories:
print(f" → surfaced {len(ex.prior_memories)} prior memories as context")
# At the end, Sonnet can summarize the entire run from user memory alone.
summary = ol.chat("What pattern emerged across today's orders?")
print(summary["reply"])5. Feedback — accept / reject / correct
ex = Extraction("quote.pdf", user_id="7a3f9b2c", auto_memory=True)
# All three return a memory entry tagged 'feedback' and persist to disk + S3.
ex.accept(note="looks right")
ex.reject(reason="wrong client — this is ASCA not ASICA")
ex.correct(line=2, field="verre1", was="44.2", should_be="44.2 LowE")
# Feedback is searchable like any other memory:
from monceai import Outlook
ol = Outlook(user_id="7a3f9b2c")
corrections = ol.memories(tag="correct", limit=50)6. Stats + history + recall
ol = Outlook(user_id="7a3f9b2c")
ol.stats() # {'memories': 50, 'extractions': 16, 'conversations': 3}
ol.history(limit=10) # last 10 extractions with routing + trust
ol.memories(limit=20) # full memory list (optionally tag-filtered)
ol.recall("VIP cloisonneur") # keyword-scored search
ol.forget("outdated pricing") # substring match, returns count deleted7. Factory-aware /extract pipeline — drop-in replacement for claude.aws
examples/extract_pipeline.py is a single-file
Extract class that assembles Extraction + Outlook + Matching + Json +
Charles into a payload that is byte-compatible with
POST https://claude.aws.monce.ai/extract.
from extract_pipeline import Extract
ex = Extract("quote.pdf", factory_id=4, user_id="7a3f9b2c", industry="glass")
ex["extracted_data"]["value"]["measurements"] # prod schema
ex["extracted_data"]["client_matching"] # {"numero_client", "nom", "confidence"}
ex["metadata"]["routing_decision"] # "auto_approved" | "human_review"
ex.measurements # same as above, convenience accessorWhat it actually does:
Outlook.recall(q=f"factory_{factory_id}")pulls user-specific priors.Extraction(source, user_id=..., industry="glass", context=...)runs the selfservice VLM lift per document with priors threaded intocontext.upgrade_matchesfires oneMatching(..., field=..., factory_id=...)future per (row × field) in parallel; low-confidence hits (<0.75) fall back toJsonarbitration over the SDK's top-N candidates.upgrade_clientfires 4 parallelMatchingfutures (nom / logo / raison_sociale / siret), argmax wins.Jsoncross-doc synthesis whenlen(sources) > 1.Charlesnarrates the run for_handle_metadata.agent_summary.Outlook.rememberlogs the run so the next call for this user recalls what happened.
All prompts live in one file as triple-quoted f-strings keyed off a
FACTORY table — one row per factory_id (1=VIT, 3=Monce, 4=VIP,
9=Eurovitrage, 10=TGVI, 13=VIC), driving prompts, matching fields, and
normalization toggles (spacer color, default gas, IGU decomposition).
python examples/extract_pipeline.py quote.pdf --factory 4 --user-id 7a3f9b2c
python examples/extract_pipeline.py a.pdf b.pdf order.eml --factory 3 \
--user-id 7a3f9b2c --jsonSynthax is a multi-stage reasoning pipeline — each specialist's output
becomes the next's input. The planner (Haiku) picks the chain per
prompt; math skips the Architect, glass inserts Moncey, architecture
inserts the ASCII diagrammer. Adversary (cold Sonnet) attacks the draft;
revise patches the holes. Verify backstops numeric claims with exact
Calc. Arbiter (Sonnet) synthesizes TL;DR + confidence + residual
doubts. Notify writes the verdict back to Concierge so the next run
recalls it.
from monceai import Synthax
# Input: a text prompt. Output: a Haiku-compacted TL;DR with an
# exhaustive Sonnet answer attached.
s = Synthax("design an auth layer for a glass factory portal",
budget_usd=12.0)
# Output shape
str(s) # TL;DR (≤ 3 sentences, ≤ 280 chars)
s.answer # exhaustive Sonnet synthesis
s.job.stages # list[Stage] — full audit timeline
s.job.artifacts # dict{stage_name → text}
s.job.cost_usd # accumulated USD, hard-capped at budget
s.job.elapsed_ms # wall-clock
s.job.confidence # float 0..1 from arbiter
s.job.doubts # list[str] — residual concerns
s.job.arbiter_rationale
print(s.report()) # human-readable stage-by-stage timelineReusable client (lazy futures, like Charles/Moncey):
s_client = Synthax()
a = s_client("factor 10403 and prove uniqueness")
b = s_client("design a migration plan for PostgreSQL partitioning")
print(a, b) # run in parallel, resolve on readPipeline shape (the planner may skip stages per bucket):
recall → plan → draft → render → adversary → revise → verify → arbiter → notify
↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑ ↑
Concierge Haiku spec. Architect Sonnet Json Calc Sonnet Concierge
memory (auma/ ASCII cold patch exact unified writeback
search science diagram attack holes arithmetic answer
/glass)
Real receipt — Synthax("What is 6x7 and why is it the answer to life?", budget_usd=0.25):
| Stage | Source | Time | Cost |
|---|---|---|---|
| recall | concierge | 72ms | $0.005 |
| plan | haiku | 1,249ms | $0.003 |
| draft | charles | 15,268ms | $0.010 |
| render | — | — | skipped |
| adversary | sonnet | 8,798ms | $0.015 — caught fake dataset |
| revise | charles-json | 14,584ms | $0.010 — patched draft |
| verify | — | — | skipped (no numeric claims) |
| arbiter | sonnet | 9,007ms | $0.015 — TL;DR confidence=0.98 |
| notify | concierge | 0ms | $0.005 |
Total: 9 stages · 49s · $0.063 · confidence 0.98 · 2 residual doubts. Budget hard-cap $0.25 was not exhausted; adversary caught a draft hallucination, revise cleaned it, arbiter delivered a 2-sentence TL;DR within 280 chars.
.replay(from_="revise", with_extra="...") resumes the pipeline with
altered context, reusing earlier artifacts as priors.
Takes any text prompt, hits the live web, returns a Haiku-synthesized
paragraph with inline [1][2] citations. str(Google(q)) IS the
synthesis, ready to feed downstream.
from monceai import Google
g = Google("prix verre 44.2 rTherm 2026")
# Output
str(g) # "Le verre 44.2 rTherm se situe autour de ... [1][2]"
g.results # [{"title", "url", "snippet"}, ...]
g.raw_html # backend HTML (for debugging)
g.search_ms # search latency (before synthesis)
g.result # LLMResult with tokens + sat_memoryClient mode with lazy parallel futures:
g = Google()
a = g("Kissat SAT solver")
b = g("monceai SDK")
print(a, b) # both resolve in parallelChain into Synthax for grounded deep reasoning (RAG in 3 lines):
from monceai import Google, Synthax
ctx = Google("current market price for 44.2 rTherm glass France 2026")
s = Synthax(f"Answer with sources and confidence: "
f"what should I quote a client for 20 units? "
f"Web context:\n{ctx}",
budget_usd=2.0)Zero-LLM compute for prompts that have a deterministic answer. Detects
factoring / raw DIMACS / pure arithmetic, builds the matching CNF or
Decimal expression, dispatches to npdollars.aws.monce.ai/solve where
Kissat runs the SAT, and returns the verified answer with a proof
certificate.
from monceai import Computation
Computation("factor 10403")
# → "10403 = 101 × 103" (binary-multiplier CNF, Kissat, 0 tokens)
Computation("factor 2027")
# → "2027 is prime (UNSAT on binary-multiplier CNF)"
Computation("6x7")
# → "42" (local Decimal, no network)
Computation("p cnf 3 2\n1 2 0\n-1 3 0\n")
# → "SAT assignment: [1, 2, 3]" (raw DIMACS passthrough)
# Non-matching prompts → empty string + .recognized = False
c = Computation("explain gravity")
c.recognized # FalseAttributes:
str(c)— the verified answer (or""if no pattern)c.recognized— True if a computable pattern was detectedc.pattern—"factor" | "dimacs" | "arith" | "coloring" | "none"c.proof— DIMACS header, SAT assignment, Kissat ms, etc.c.elapsed_ms,c.cost_usd
How it works under the hood. The factoring encoder builds a full binary-multiplier CNF via AND/XOR/full-adder gadgets (Dana-theorem polynomial construction): each bit of P × Q is a fresh SAT variable, column sums use ripple-carry adders, non-triviality is P ≥ 2 ∧ Q ≥ 2. The CNF is sent to Kissat on npdollars which returns either SAT (with an assignment we decode into P and Q) or UNSAT (N is prime).
v0 limits. The binary-multiplier CNF exceeds the npdollars nginx 413 body limit past ~16-bit N. Demo-sized numbers only; server-side streaming upload is the extension path.
Live receipts:
| Input | Wall | Result |
|---|---|---|
Computation("factor 15") |
156 ms | 15 = 3 × 5 (Kissat: 1.5 ms, 245 vars) |
Computation("factor 2027") |
417 ms | 2027 is prime (UNSAT) |
Computation("6x7") |
0 ms | 42 (Decimal, no network) |
Use as a parallel branch in Synthax: fire alongside the LLM draft,
dismiss when .recognized=False, promote to winner when it's True —
skipping adversary/revise/verify and saving tokens.
Snake classification on data you supply inline. ML detects a CSV
block in the prompt (header + ≥2 data rows) alongside a classify-shaped
verb, trains a Snake via snakebatch.aws.monce.ai/csv/run, and returns
<class> (p=<confidence>).
from monceai import ML
r = ML('''Classify: is (5.1, 3.5, 1.4, 0.2) a setosa?
sepal_length,sepal_width,petal_length,petal_width,species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
7.0,3.2,4.7,1.4,versicolor
6.4,3.2,4.5,1.5,versicolor
''')
str(r) # "setosa (p=0.97)"
r.prediction # "setosa"
r.confidence # 0.97
r.proof # Snake audit trail: model_id, literal testsNo CSV → r.recognized = False. Used by Synthax as a parallel branch
for prompts that smell like "predict / classify / is X a Y given this
data", same early-exit semantics as Computation.
Fire-and-forget N-label classification over arbitrary context — text, documents (paths / bytes / tuples), and free-form side arrays. The constructor returns immediately and runs a background pipeline:
- Phase 1 (Haiku, ~7s) — fast verdict on email text + filenames.
Fires in parallel with VLM extraction so
.previewlands as soon as Haiku responds, not after the documents finish extracting. - Phase 2 (Sonnet via
charles-json, ~18s) — strict verdict on the full VLM-fused context with evidence and flippers. - Guaranteed answer within
timeout(default 30s) — if Phase 2 doesn't finish in time or produces unparseable JSON, the Phase 1 preview is promoted to.labeland marked.tentative=True. Never hangs, never raises.
from monceai import Classifier
clf = Classifier(
labels=["order", "quote", "informative"],
rules="order=pipeline-ready PO/BL/invoice; "
"quote=needs human estimator; "
"informative=everything else",
documents=["po_attached.pdf", ("drawing.png", png_bytes)],
text="Merci de me chiffrer l'intercalaire pour du 44.2 rTherm",
factory_id=4,
timeout=30, # hard cap, verdict guaranteed by then
)
clf.preview # {'label': 'quote', 'confidence': 0.95, ...} ~7s
clf.label # 'quote' (blocks up to 30s for the Sonnet verdict)
clf.confidence # 0.95
clf.evidence # ["'me chiffrer' explicit quote request", ...]
clf.flippers # ["if PO number visible in drawing → flip to order"]
clf.runner_up # 'informative'
clf.pipeline_ready # False (needs human)
clf.tentative # False (True if Phase 2 fell back to preview)
clf.elapsed_ms # 17234| getter | blocks until | typical latency |
|---|---|---|
.preview / .fast |
Phase 1 done | 3-10s |
.ready_fast / .ready |
non-blocking poll | 0ms |
.label / .confidence / .evidence / ... |
Phase 2 done (or timeout→fallback) | ≤30s |
.wait(timeout=...) |
explicit block | user-controlled |
Show the preview to a human operator instantly, then upgrade the card when the strict verdict lands:
clf = Classifier(labels=[...], rules="...", documents=[...], text=...)
render_pending(clf.preview) # instant UI
render_final(clf.to_dict()) # upgrade with evidence once readyverdicts = Classifier.batch(
jobs=[{"documents": [p], "text": body} for p, body in pairs],
labels=["order", "quote", "informative"],
rules=RULES,
factory_id=4,
timeout=30,
parallel=3, # concurrent classifications
)Sequential, one classifier at a time, mixed Order / Extraction images paired with realistic French email bodies:
| metric | value |
|---|---|
| accuracy | 10 / 10 (EXCELLENT) |
| avg confidence | 87.4% |
| avg Phase 1 latency | 7.0s |
| avg Phase 2 latency | 17.6s |
| wall time (sequential) | 175.9s (~18s / sample) |
| tentative verdicts | 4 / 10 (recovered by Phase 1 fallback) |
The four tentative verdicts are the mechanism paying out: Phase 2
returned unparseable text on those samples and the Phase 1 preview —
already correct — was promoted automatically. Without the fallback
they would have been "informative" / 0% defaults.
PDFs are rasterized client-side to PNG (first page, 150 DPI) before the
VLM call. The monceapp /v1/chat gateway only forwards images to
Bedrock — raw PDF bytes get dropped silently. Requires PyMuPDF
(pip install pymupdf); if unavailable, PDFs fall back to a multipart
upload attempt. Real-world PDF VLM extraction runs 14-22s, so the
default extract_timeout is 25s and effective timeout should be
≥45s when you pass PDFs.
Raw Bedrock Sonnet ignores strict-JSON instructions under
rules-heavy prompts and produces mixed prose + JSON that fails
json.loads. charles-json enforces strict-JSON server-side.
Swapping haiku/sonnet → charles-json on the same 10-sample
benchmark took accuracy from 6/10 MEH → 10/10 EXCELLENT and
cut wall time from 246s → 176s. Override via fast_model= /
deep_model= kwargs if you need a different routing.
Classifier(
labels, # list[str] — the N mutually-exclusive classes
rules="", # free-form natural-language rules
documents=None, # paths | bytes | (filename, bytes) tuples
text=None, # email body, message, any free-form text
factory_id=0, # factory scope for Monce context
timeout=30, # hard cap on .label blocking
fast_timeout=12, # cap on Phase 1
extract_timeout=8, # per-document VLM cap
extras=None, # dict of arbitrary side arrays → [BLOCKS]
parallel=4, # in-Classifier document extraction workers
fast_model="charles-json",
deep_model="charles-json",
)A drag-drop-connect canvas for every module in this SDK. Drop nodes, wire ports, hit Play. The right pane emits the exact Python that would produce the same result — copy, paste, run locally.
https://monceapp.aws.monce.ai/playground
Features
- 13 module nodes:
Context,Charles,Moncey,Json,Matching,Calc,Diff,Concierge,Architect,Google,Synthax,Computation,ML,Arbiter - Fan-in: a node can have multiple upstream parents, concatenated
under
[Label]headers for LLM nodes - Arbiter node: Sonnet synthesizes N candidate answers into one,
citing
[Agent N]per claim - Colored ports by payload type — text (blue) · document (red) · number (green) · web (yellow) · proof (emerald) · synth (purple)
- Live SSE streaming: each node paints green the instant it completes — Calc's 87 ms answer appears 4 s before Concierge's 4075 ms answer on a parallel fan-out
- Server-side parallelism: independent nodes at the same topological level run concurrently (ThreadPoolExecutor, max 8 workers)
- Canvas pan: drag empty space to move the viewport; ⊙ Reset view snaps back to origin
- Templates: 6 golden one-tap graphs (Monce Stack · Glass Quote · Ramanujan 1729 · Raw vs Monce · RAG in 3 Nodes · Synthax Flex)
- Save / Load user graphs to localStorage; Import .py parses
any snippet of
monceaicalls into a graph via the server's AST parser atPOST /playground/import - Draft auto-save: every canvas mutation persists to localStorage, no reload can lose your work
- Live Python export — the canvas and the exported snippet always match, byte-for-byte
- Synthax pseudo tab unfolds the 9-stage pipeline as linear Python
- Mobile-friendly: horizontal palette strip, bottom drawer for Python, tap-to-connect ports, pinch-zoom canvas, auto-compact boot scene under 900px
- Shareable URL state: every node position + edge serialized to
?g=...so graphs are pasteable
The default boot scene is self-referential: "Moncey, quelle est la feature #1 sur monceapp aujourd'hui ?" fans out to Moncey, Matching, Concierge, Google, and Synthax in parallel, then an Arbiter weaves a single unified response — every part of the Monce stack collaborating on a meta-question.
Powered by proprietary Monce models:
The OS layer. One constructor binds factory_id, tenant, and framework_id
for the session. Every verb routes through the proprietary Monce models
(charles-json, moncey, concierge) — never bare Haiku/Sonnet.
from monceai import MonceOS
os = MonceOS(factory_id=4, tenant="riou", framework_id="field_riou_test")
# Voice/transcript → typed, validated CR (charles-json, 4-payload Monce model)
cr = os.capture(transcript=stt_output, today="2026-04-22")
cr.summary # 2-3 sentences
cr.actions # [Action] — enum-clamped owner_team, deadline, amount_eur
cr.contacts_met # [Contact] — is_new flagged
cr.sentiment # "positive" | "neutral" | "negative"
cr.next_step # NextStep(what, when)
cr.to_json() # schema-stable dashboard payloadA brick kit for Monce OS bricks (Field, Orders, Quotes, Concierge). The SDK primitives (LLM, Json, Matching, Calc, Concierge, Moncey) stay untouched; MonceOS composes them into verbs that bricks consume without re-wiring factory scoping, framework binding, or model selection on every call.
from monceai import MonceOS
os = MonceOS(factory_id=4, tenant="riou", framework_id="field_riou_test")
cr = os.capture(transcript=stt_output) # ~10s, typed, validated
for a in cr.actions: route_to_team(a) # enum → inbox
save_to_s3(cr.to_json()) # tenant-scoped, permanentowner_team∈{sales_ops, service, quoting, logistics}— enum-clamped; model drift mapped to valid vocabpriority∈{high, medium, low}sentiment∈{positive, neutral, negative}amount_euris number-or-null, never stringdeadline/next_step.whenin ISO 8601 (computed fromtoday=)- Guard:
{"error": "recording_too_short"}for <30s usable speech
python examples/field_flow.pyFull loop against live factory 4 (VIP / RIOU Glass): capture → route → match
client ACTIF PVC (#55298) → verify arithmetic via Calc → agents
(Moncey, Concierge) for glass decode and account Q&A. ~27s end-to-end.
Train an explainable classifier or regressor on tabular data. No API key.
Cloud training fans out across the v9 100-shard Lambda mesh
(snakebatch.aws.monce.ai); inference runs locally by default via
algorithmeai (pulled in as a
git dep) and dispatches to the cloud on big batches or when audit /
lookalikes need the training population.
pip install git+https://github.com/Monce-AI/monceai-sdk.gitfrom monceai import Snake
model = Snake(rows, target_index="label") # POST /v9/train, polls to done
model.get_prediction({"x": 0.5, "y": 0.5}) # local — sub-ms, $0
model.get_prediction([{...}, {...}, ...]) # local if small, cloud if big
model.get_probability(X) # → {"A": 0.97, "B": 0.03}
model.get_audit(X) # cloud — needs population
model.get_augmented(X) # → {Prediction, Probability, Audit, Lookalikes}The constructor mirrors algorithmeai.Snake exactly — same 13 positional and
keyword args (target_index, n_layers, bucket, noise, vocal,
workers, oppose_profile, lookahead, datatypes, …). v9-specific
options (endpoint, cloud_threshold, tau, model_id) are keyword-only,
so positional code is portable between the two libraries.
monceai.Snake lazily fetches the stripped model on first inference call
(/v9/model/{id}/stripped, ~5% of full size, no population) and runs
algorithmeai inference in-process. Every call goes local except:
- batch length ≥
cloud_threshold(default5000) — cloud parallelism wins mode in {audit, lookalikes, augmented, candle}— needs the population
m = Snake(rows, target_index="label")
m.get_prediction({"x": 0.5}) # local, ~0.2ms
m.get_prediction([X] * 50_000) # cloud — fans across the mesh
m.get_audit(X) # cloud — population modes always cloud
# Override the threshold per-instance
fast = Snake(model_id="v9-...", cloud_threshold=10_000)Snake(rows, target_index="label") # train on cloud
Snake("training_rows.json") # train from a JSON file of rows
Snake("trained_model.json") # upload an algorithmeai model to v9
# (stages model.json + stripped on S3)
Snake(model_id="v9-abc123-1234") # reconnect to an existing model
Snake("v9-abc123-1234") # same, string shorthandSnake("model.json") automatically detects whether the file is training rows
(list) or a fully-trained Snake (dict with "layers"); the trained-model
case POSTs to /v9/upload and returns an SDK instance ready for cloud
predict-at-scale on a model you trained offline.
monceai.Snake expands algorithmeai.Snake — every method on the local
class is reachable on the cloud one. Unknown attributes lazy-load the
stripped model and forward through:
m.targets # ['A', 'B']
m.header # ['label', 'x', 'y']
m.layers # list[bucket-list]
m.oppose(A, B) # → literal
m.apply_literal(X, lit)
m.full() # download full model with population, return algorithmeai.SnakeUse m.full() when you need population-bearing methods offline — it
caches the full model in /tmp and returns a plain algorithmeai.Snake.
When the target is a float, every prediction yields a candle summarising the lookalike y distribution: high, q3, median, q1, low, mean, iqr_mean, std, n. Same model, same code — Snake just stops voting and starts averaging the consensus middle.
from monceai import Snake, Candle
model = Snake(houses, target_index="price") # train on cloud — float target
c = model.get_candle({"sqft": 1800, "neighborhood": "CollgCr", "yearBuilt": 2003})
c.high, c.q3, c.median, c.q1, c.low # five-number summary
c.mean, c.iqr_mean, c.std, c.n # point estimates + dispersion
c.to_dict() # JSON-friendly
y_hat = model.get_regression(X) # float (IQR-trimmed mean)
candles = model.get_batch_candles(X_test) # one round trip, list[Candle]
preds = model.get_batch_regression(X_test) # list[float]Why a candle, not a point estimate? A set of lookalikes has no order — there's no open/close. So OHLC is replaced with a five-number summary plus mean, IQR-trimmed mean, std, and n. Wide wicks → "model is unsure, tail risk real." Tight body around the median → "high consensus." Confidence interval for free, on every prediction, no bootstrapping.
▲ high ── max(y over lookalikes)
│
╔╧╗ q3 ── 75th percentile
║─║ median ── 50th percentile
╚╤╝ q1 ── 25th percentile
│
▼ low ── min(y over lookalikes)
Why IQR-trimmed mean? get_prediction is mode-like — it returns the most-frequent lookalike y, brittle on continuous targets. iqr_mean averages the middle 50% of lookalikes — robust like the median, smooth like the mean. Both run from the same lookalike fetch (one Lambda round trip per batch).
Ames Housing benchmark (1168 train / 292 test, 79 features incl. categorical strings — Neighborhood, MSZoning, BldgType, RoofStyle):
| Method | R² | RMSE | MAE |
|---|---|---|---|
get_prediction (classification path, mode vote) |
0.7003 | $47,975 | $31,693 |
get_batch_regression (IQR-trimmed mean) |
0.7080 | $47,357 | $25,703 |
+0.77pp R², 18.9% MAE reduction. Cloud trained 1168 rows in 2.3s, full population (1460) in 2.97s with R² = 1.0000 perfect-fit (mode vote and regression both reconstruct the training distribution exactly — Dana Theorem in action). Drop-in parity with algorithmeai.Snake v5.4.6.
| Method | Returns | Description |
|---|---|---|
get_candle(X) |
Candle |
Distribution of lookalike y values |
get_batch_candles(Xs) |
list[Candle] |
Batched, single Lambda round trip |
get_regression(X) |
float |
IQR-trimmed mean of the candle |
get_batch_regression(Xs) |
list[float] |
Batched regression |
Candle is a dataclass exposing high, q3, median, q1, low, mean, iqr_mean, std, n plus to_dict(). Both Candle and compute_candle are exported from monceai.
from monceai import Snake
Snake.health()
# {"ok": True, "sdk_version": "2.2.0", "backend_version": "9.0.0",
# "compatible": True, "predict_tau": 300, "n_shards": 100, ...}compatible == True means the SDK and the v9 backend share a major version.
Run this once at app startup if you want to fail loud instead of failing on
the first Snake(...) call.
python tests/test_snake_v9.py # 67 tests, live against snakebatch.aws.monce.aiCovers: imports, constructor signature parity with algorithmeai, train via
/v9/train, reconnect by id, single + list prediction, local↔cloud parity,
probability/audit/augmented/lookalikes, attribute forwarding, Snake("model.json")
upload, and to_json round-trip back into algorithmeai.Snake.
For unlimited offline inference, hand the model off explicitly:
from monceai import Snake
from algorithmeai import Snake as Local
cloud = Snake(rows, target_index="label") # train on cloud
local = Local(cloud.to_algorithmeai()) # downloads to /tmp, returns path
local.get_prediction(X) # offline, sub-millisecondmonceai.Snake does this lazily under the hood (m.full() returns the
same instance), so a single m.get_prediction(X) call already costs $0
after the first model fetch. Explicit handoff is for when you want to take
the model to another machine entirely.
Snake's killer feature is get_audit() — a deterministic, layer-by-layer
text dump of the AND-clauses that justified each prediction. snakeaudit
parses that text into honest, per-hypothesis biomarker statistics you can
drop straight into a paper or a slide. It never edits Snake —
algorithmeai>=5.4.6 is a hard, read-only dependency.
from monceai import AuditMiner, Hypothesis, Report, write_comparison
miner = AuditMiner(
hypotheses=[
Hypothesis(
name="BRAF V600E → drug sensitivity",
description="BRAF hotspot drives MAPK addiction.",
feature_columns=["hotspot_mutations", "protein_changes"],
genes=["BRAF"],
pred_class="1",
),
Hypothesis(
name="HRD (BRCA1/2) → PARP sensitivity",
feature_columns=["damaging_mutations", "hotspot_mutations"],
boolean="BRCA1 OR BRCA2 OR ATM OR PALB2",
pred_class="1",
),
],
feature_columns=df.columns.tolist(),
biomarker_genes=("BRAF", "BRCA1", "BRCA2", "ATM", "TP53", "KRAS", "EGFR"),
)
miner.fit(audits=stream, rows=df, target="sensitive")
Report(miner.result()).write("./out") # → out/audit_report.md + out/audit_features.csvOr load hypotheses from a config file (.toml / .json / .py):
miner = AuditMiner.from_config("hypotheses.toml", feature_columns=cols)biomarker_genes = ["BRAF", "BRCA1", "BRCA2", "ATM", "PALB2", "TP53"]
[[hypotheses]]
name = "BRAF V600E → Dabrafenib sensitivity"
description = "BRAF hotspot drives MAPK addiction."
feature_columns = ["hotspot_mutations", "protein_changes"]
genes = ["BRAF"]
pred_class = "1"
[[hypotheses]]
name = "HRD (BRCA1/2) → Olaparib sensitivity"
feature_columns = ["damaging_mutations", "hotspot_mutations"]
boolean = "BRCA1 OR BRCA2 OR ATM OR PALB2"
pred_class = "1"Naïve full-text grep on Snake audits over-counts: many literals
(jaccard("col", "..."), prefix("col", "...")) embed neighbouring
cell-line tokens as comparison constants. A naïve HUGO regex would pick up
genes that aren't even in the audited row. snakeaudit always
cross-checks gene presence against the audited row's ground-truth
feature values before counting.
- Biology:
genes(or boolean expression"BRCA1 AND (BRCA2 OR PALB2)") matches the audited row's actualfeature_columnscontent. - Snake invoked: at least one Snake AND-clause references one of the declared columns.
- Class match: Snake's prediction equals
pred_class(when set).
It's fine for a hypothesis never to fire — that's a meaningful negative
result, and the report shows it as 0/N.
from monceai import write_comparison
write_comparison(
"audit_comparison.md",
per_slice_results={"A_train": r1, "A_test": r2, "B_train": r3, "B_test": r4},
rf_column_importance=rf_imp, # optional — sklearn aggregated to column level
gb_column_importance=gb_imp,
)Renders three tables that align Snake's audit hit rate, RF feature importance, and GB feature importance side by side on the held-out average — useful for inter-method agreement: a column that all three flag is a strong candidate biomarker.
| metric | value |
|---|---|
| audits parsed | 475 |
| dominant predictions | 234 (49.3%) |
| total literal findings | 113546 |
| unique columns referenced | 13 |
## Hypotheses
### BRAF V600E → Dabrafenib sensitivity
> BRAF hotspot drives MAPK addiction.
genes = `BRAF` • columns = `hotspot_mutations, protein_changes` • pred_class = `1`
| metric | value |
|---|---|
| both (headline) | 51/380 (13.4%) |
| both, predicted dominant | 0/188 (0.0%) |
| both, predicted non-dominant | 51/192 (26.6%) |
| Δ(rate_dom − rate_nondom) | +0.266 |
from monceai import (
AuditMiner, Hypothesis, HypothesisSet, Report,
parse_audit, columns_referenced,
load_hypotheses, save_hypotheses,
render_comparison, write_comparison,
)
# or the full sub-package:
import monceai.snakeauditAuditMiner(hypotheses, feature_columns, *,
biomarker_genes=(), biomarker_feature_columns=None)
AuditMiner.from_config(path, *, feature_columns,
biomarker_genes=(), biomarker_feature_columns=None)
AuditMiner.empty(feature_columns)rows accepts a pandas DataFrame or a Mapping[row_index, dict[col, value]]
— pandas is optional. Hypothesis configs in TOML, JSON, or Python
(HYPOTHESES = [...]).
See monceai/snakeaudit/README.md and
monceai/snakeaudit/example_hypotheses.toml
for the full guide.
export SAT_API_KEY="sk-sat-..."from monceai import SAT
result = SAT("p cnf 3 2\n1 2 0\n-1 3 0\n")
result.result # "SAT"
result.assignment # [1, -2, 3]requests— the only runtime dependency- No API key for LLM / VLM / Charles / Snake
SAT_API_KEYfor SATalgorithmeai(optional) — install forSnake.to_algorithmeai()local handoff
Charles Dana · Monce SAS · 2026