Releases: yaniv-golan/proof-engine
Releases · yaniv-golan/proof-engine
v1.18.0
Added
- KaTeX math rendering — mathematical notation in proof claims and narratives now renders as typeset math across all site surfaces. Three rendering paths: KaTeX client-side for headings and narrative markdown,
pymdownx.arithmatexfor markdown pipeline protection, andstrip_latex()Unicode conversion for plain-text surfaces (OG tags, JSON-LD, citations, page titles). tools/lib/latex_utils.py—strip_latex()function converts\(...\)LaTeX delimiters to Unicode equivalents (Greek letters, sub/superscripts, operators) for contexts where client-side rendering is unavailable.tools/add-latex-to-claims.py— interactive script for retroactive conversion of math-heavy proof claims to use LaTeX delimiters. Supports dry-run mode, manual editing, and preserves proof.json/proof.py provenance parity. Skips DOI-backed proofs.- KaTeX v0.16.45 vendored — self-hosted CSS, JS, auto-render plugin, and 60 font files at
site/static/vendor/katex/. pymdownx.arithmatex— integrated into the markdown sanitizer to protect\(...\)and\[...\]delimiters from markdown processing. Configured withinline_syntax: ["round"]andblock_syntax: ["square"]to avoid$...$currency collisions.- Math rendering in catalog —
renderMathInElementcalled after card rendering in bothcatalog.jsandcatalog-enhance.js.
Changed
tools/build-site.py— registersstrip_latexas a Jinja2 filter; pre-stripsclaim_naturalin pipeline example data.tools/lib/json_ld.py— appliesstrip_latex()toclaimReviewedfield.tools/lib/citation.py— appliesstrip_latex()to citation claim text.site/templates/proof.html—strip_latexfilter on title, OG tags, meta description, and share bar;<h1>left raw for KaTeX client-side rendering.site/templates/landing.html—strip_latexfilter on myth-card claims and featured proofs data.- CI workflows —
pymdown-extensionsadded to pip install invalidate.yml(both jobs) anddeploy-site.yml. - SKILL.md — added LaTeX delimiter guidance for proof authors.
v1.17.0
Added
proof_format_schema.json— single source of truth for proof markdown section requirements, shared between the proof-engine skill (producer) and site builder (consumer). Defines v1/v2 profiles forproof.md,proof_audit.md, andproof_narrative.md, plus conditional sections and template fallback mappings.
Changed
proof_loader.py— section requirements now read fromproof_format_schema.jsoninstead of hardcoded constants. Profile selection usesoriginal_format_versionto choose v1 or v2 validation rules.narrative_validator.py— required narrative sections now sourced from schema instead of a hardcoded list.proof.html— replacedformat_versionbranching with fallback chains (Quality ChecksorHardening Checklist,Source DataorExtraction Records, audit or proof.mdClaim Interpretation).output-specs.md— added schema reference, documentedProofSummaryBuilderas primary emission path, fixed narrative heading casing to title-case.SKILL.md— documentedProofSummaryBuilderin Bundled Scripts table and Key function signatures, updatedemit_proof_summarygotcha, fixed narrative heading casing.
Fixed
- Legacy
emit_proof_summary()now defaultsformat_versionto 2 — proofs generated via the legacy path no longer land with missingformat_version, which caused the loader to apply v1 section requirements to v2-style proofs.
v1.16.0
Added
rejection_statementfield for disproof proofs — eachempirical_factsentry in a disproof must include arejection_statementfield: the verbatim phrase from the quote that explicitly rejects the claim.validate_proof.pywarns when the field is absent and raises an issue when it is present but not a substring of the associatedquote. Replaces the 25-patternREJECTION_MARKERSvocabulary scan.is_time_sensitivefield inCLAIM_FORMAL— proofs that depend on the current date declare"is_time_sensitive": TrueinCLAIM_FORMAL.validate_proof.pyuses AST to read this field and enforces four behavioral branches (declared+today → pass; declared+no today → issue; today without declaration → warning; hardcoded date without today → issue). Replaces comment-strip + regex keyword scan.verbatimfield perempirical_factsentry — authors can declare"verbatim": Falsewhen a quote is paraphrased.validate_proof.pychecks this field structurally: warns onverbatim: False, raises an issue onverbatim: Truewith an ellipsis (contradiction), and nudges on ellipsis without any declaration. Replaces ellipsis-only heuristic.subclaim_to_sourcesmap inCLAIM_FORMAL— compound proofs can declare an explicitsubclaim_to_sourcesdict mapping each sub-claim ID to its list ofempirical_factskeys.validate_proof.pyPath 1 uses this map directly; Path 2 falls back to key-prefix inference for proofs that don't provide it.- AST-based Rule 5 check —
adversarial_checksis verified via AST list-element count, not vocabulary scanning. Empty list → issue;count ≥ 1→ pass with count; non-list or missing → regex fallback. - W3C PROV-JSON export —
tools/lib/prov.pygeneratesprovenance.jsonper proof: a W3C PROV-JSON provenance chain mapping each evidence entity, citation-verification activity, and cross-check derivation back to the Proof Engine agent. Included in RO-Crate packages. - SARIF 2.1.0 export —
tools/lib/sarif.pyconvertsvalidate_proof.pyresults to SARIF 2.1.0. Each hardening rule maps to a stable rule ID (PE001–PE010); issues areerrorlevel, warnings arewarning. Enables integration with GitHub Code Scanning and other SARIF-aware tooling. - RO-Crate 1.1 packaging —
tools/lib/ro_crate.pygeneratesro-crate-metadata.jsonper proof: a standards-compliant research object manifest listing all proof artifacts (proof.py,proof.json,proof.md,proof_audit.md,proof_narrative.md,provenance.json,proof.ipynb) with typed schema.org roles and DOI links. - Jupyter Notebook export —
tools/build-site.pygeneratesproof.ipynbper proof: a two-cell Jupyter Notebook that installs dependencies and re-runsproof.pyin an interactive environment. Included in the RO-Crate manifest as aComputationalNotebook. - Enhanced Schema.org JSON-LD — proof pages now include
isBasedOn(links to each cited source URL),mainEntity(theClaimReviewblock), andsameAsprovenance links. Improves search engine and Linked Data discoverability.
Changed
validate_proof.pydesign principle — all new checks read structured fields declared by the LLM at generation time; validator does mechanical verification only (substring containment, list length, field presence). No semantic inference from free text.- Hardening rules documentation — updated validator notes for Rule 3 (is_time_sensitive behavioral branches), Rule 5 (AST non-empty list check), and Rule 8 (rejection_statement enforcement).
output-specs.md— addedrejection_statement,Verbatim status, andTime sensitivityto Citation Verification Details.template-qualitative.md— addedis_time_sensitivecomment toCLAIM_FORMAL,verbatimcomment toempirical_facts, and expanded disproof variant to showrejection_statementfield explicitly.template-date-age.md—is_time_sensitive: Truenow included inCLAIM_FORMAL.template-compound.md— commentedsubclaim_to_sourcesblock added toCLAIM_FORMAL.
v1.15.0
Added
- Proof detail page redesign — restructured proof detail template with verdict qualifier line, jump links (summary · caveats · sources · audit trail), promoted counter-evidence section ("What could challenge this verdict?"), canonical sources table, collapsible downloads, and single generator footer
- Format version support —
format_versionfield inproof.jsonenables v1/v2 proof format branching in loader and template. V2 proofs use renamed sections (Quality Checks, Source Data) and move Claim Interpretation to the audit trail _SOURCE_TYPE_DISPLAY_LABELS— capitalized source-type labels for the detail page, separate from the lowercase landing-page labels
Changed
- Proof detail template — evidence accordion slimmed to 3 sections (Evidence Summary, Proof Logic, Conclusion); audit trail reordered with format-version-aware section list; page title truncated at 50 chars
- Output specs (v2) — renamed Counter-Evidence Search → "What could challenge this verdict?", Hardening Checklist → Quality Checks, Extraction Records → Source Data; Claim Interpretation moved from
proof.mdtoproof_audit.md - Proof loader — v1/v2 required/optional section lists;
format_versionhoisted to top-level proof dict; Claim Specification made optional for v1 (3 existing proofs lack it)
Fixed
- Generator footer stripping — regex handles both plain and italic-wrapped (
*Generated by...*) footers - Dead inline analytics script — removed from template (proof-enhance.js handles it)
- Jump links spacing — fixed Jinja2 whitespace control for dot separators
v1.12.0 — Agent Guardrails
Added
apply_verdict_qualifier()helper incomputations.py— validates base verdict against the 5-value taxonomy and only appends "(with unverified citations)" to the 3 qualifiable verdicts (PROVED, DISPROVED, SUPPORTED). Prevents agents from constructing invalid verdict stringsemit_proof_summary()helper incomputations.py— validates proof summary keys against theProofDataTypedDict schema before printing, raisingValueErroron unknown keys. Prevents agents from inventing schema fields- Verdict validity check in
validate_proof.py— detects invalid verdict strings and the+=antipattern for building verdicts - FACT_REGISTRY format check in
validate_proof.py— ensures registry entries are dicts (not plain strings) with required keys per fact type claim_naturalkey check invalidate_proof.py— warns when bare"claim"is used instead of the required"claim_natural"keyemit_proof_summaryadoption check invalidate_proof.py— warns when proofs use rawjson.dumpsinstead of the schema-validated helper- Type guard in
verify_citations.py—build_citation_detail()raisesTypeErrorwith actionable message when FACT_REGISTRY entries are strings instead of dicts - Key stripping in
proof_runner.py— unknown keys are silently stripped from proof JSON during publish, with stderr warning. Last line of defense after generation-time validation
Changed
- All 6 proof templates refactored to use
apply_verdict_qualifier()andemit_proof_summary(), replacing manual verdict construction and rawjson.dumps check_json_summary()updated to recognizeemit_proof_summary()as a valid summary output method- Missing-section errors in
proof_loader.pynow include the list of found sections for easier debugging
v1.11.0 — Citation Verification Hardening
Added
- Inline LaTeX
$...$stripping innormalize_text()— arXiv abstract pages with raw LaTeX like$\Lambda$CDMand$H_0 = 67.4\pm 0.5$now normalize correctly. Three-pass regex handles complex LaTeX, single-letter variables, and unadorned multi-letter tokens - Scoped Greek-to-ASCII transliteration — Greek letters from LaTeX output (Λ→L, Ω→O, etc.) are transliterated for matching, while non-LaTeX Greek (μm, ρ) is preserved to avoid false positives
- Math operator spacing collapse — ar5iv MathML rendering produces
Ω m = 0.315 ± 0.007with spaces; new steps 3a/3b collapse Greek-Latin spacing and operator spacing - Closest-passage suggestion engine —
_find_closest_passage()uses Jaccard word-set similarity to show a diagnostic hint when quotes fail verification. Ephemeral (console output only, not persisted to proof.json) - GitHub raw README fallback — bare
github.com/owner/repoURLs that return a JS-rendered React shell now fall back toraw.githubusercontent.comwith multiple README filename candidates. Reportsfetch_mode='github_raw' - Ellipsis detection in
validate_proof.py— AST-based quote extraction warns when quotes contain...or…, a strong signal of spliced non-adjacent text - Real-world demonstration search directive — Step 2 now prompts searching for practical applications of the claimed mechanism (not just benchmarks), after field testing revealed this gap
Changed
- Verbatim quoting enforcement — SKILL.md, hardening-rules.md, and environment-and-sources.md now explicitly prohibit paraphrased quotes with bad/good examples, a Quote Harvesting gate in Step 2, a pre-flight citation check in Step 3, and a Citation Recovery Loop as Step 5.5
- PDF citation guidance — rewritten to recommend snapshot workflow using Claude Code's native PDF reading; arXiv section added recommending ar5iv HTML over arxiv.org/abs
- Self-critique checklist — added verbatim quote verification and PDF snapshot checks
v1.10.0 — Formal Citations & Zenodo DOIs
chore: bump version to 1.10.0, update changelog and docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
v1.9.0 — Claim-Fidelity Hardening
release: v1.9.0 — claim-fidelity hardening (Rule 8, entailment gaps, …
v1.8.0 — Proof Narrative Layer
fix: audit accordion scrolls instead of expanding vertically Root cause: overflow-x:auto on .audit-body.animated.open forces overflow-y:auto per CSS spec, creating a scroll container instead of letting nested <details> expand the panel. Also added setTimeout fallback for transitionend (which can silently fail), leaving max-height capped at the initial pixel value. Fix: use overflow:visible on the open audit body, move overflow-x:auto to inner pre/table elements only. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
v1.7.0
Added
- Scripts: context-dependent
<sup>/<sub>handling — superscripts in running prose are stripped (e.g., footnote markers), but preserved as exponents in mathematical/scientific contexts (e.g., "10²", "m²") - Scripts: inline HTML tag stripping without injecting spaces — tags like
<span>,<em>,<a>inside quotes no longer break matching - Scripts: two-pass matching — first try exact match on cleaned text, then fall back to substring search
- Scripts: expanded Unicode invisible character normalization — strips zero-width spaces (U+200B), zero-width non-joiners (U+200C), zero-width joiners (U+200D), word joiners (U+2060), left-to-right/right-to-left marks (U+200E/U+200F), soft hyphens (U+00AD), and variation selectors (U+FE00–U+FE0F)
- Scripts: MathML
<math>tag extraction — extractsalttextattribute content and converts LaTeX notation to readable text via newlatex_text.pymodule - Scripts:
latex_text.py— converts LaTeX math notation (fractions, Greek letters, operators, superscripts/subscripts) to plain text for citation matching - Tests: integration tests for all three false-negative classes (superscript/inline-tag, invisible Unicode, MathML alttext)
Fixed
- 4 site proofs upgraded from "with unverified citations" to clean verdicts:
smartphone-screens...,the-assertion-that-no-arab-state...,the-schwarzschild-radius...,current-ai-systems-have-already...