# Announcement: WFGY 2.0 is live — next, a public exam on Stanford Terminal-Bench #52

onestardao · 2025-09-27T04:56:17Z

onestardao
Sep 27, 2025
Maintainer

Announcement: WFGY 2.0 is live — next, a public exam on Stanford Terminal-Bench

TL;DR
WFGY Core 2.0 (a 7-step symbolic reasoning engine) is now live.
Next, we’ll run it through the Stanford Terminal-Bench (TB) public exam.
All model calls will be wrapped by ΔS drift control → Coupler/BBPF → BBAM → DT guards.
Leaderboard results and reproducible scripts will be released once available.

What’s new in WFGY 2.0

Pure math, zero boilerplate. Drop in a 30-line Flagship or 1-line OneLine file and models become sharper, steadier, recoverable.
7-step chain: Parse/ΔS/Memory → BBMC → Coupler → BBPF → BBAM → BBCR → Drunk Transformer (WRI/WAI/WAY/WDT/WTF).
Observable control: ΔS, λ_observe, E_resonance — measurable signals instead of prompt tricks.

Reference: WFGY Core 2.0, Public PDF (math layer).

Why Terminal-Bench

Terminal-Bench tests LLMs under real terminal-style workflows, stressing multi-step reasoning, robustness, and tool use.
It naturally exposes semantic drift, collapse, and recovery — the exact failure points WFGY was built to fix.

How we’ll test (non-invasive wrapper)

We do not modify TB. Instead, we wrap the model calls:

Guard: monitor ΔS and λ_observe, block illegal cross-paths.
Progress: apply Coupler + BBPF, bridge only when ΔS decreases.
Recover: BBAM rebalances attention; BBCR + DT guards rollback/retry.
Log: output ΔS buckets and λ_observe states.
Compare: run A/B/C — Baseline vs Autoboot vs Explicit Invoke.

We will publish: minimal scripts, environment manifest, hashed logs, and score tables.
Preview: Terminal-Bench teaser.

What to expect

Public release of seeds, tasks (as TB rules permit), outputs, and hashes.
Metrics: Semantic Accuracy, Reasoning Success, Drift (ΔS), Stability horizon, Self-Recovery.
A clear step-by-step reproduction path.

Until leaderboard results are posted, the dedicated TB repo stays “Coming Soon” to protect artifacts.

Quick start (today)

New here? Starter Village
Want the tiny engine? WFGY Core 2.0
Browse fixes: Problem Map

If WFGY helped you fix a real bug, a ⭐ helps others discover it.
GitHub Stars

—

PSBigBig • WFGY

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

# Announcement: WFGY 2.0 is live — next, a public exam on Stanford Terminal-Bench #52

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

# Announcement: WFGY 2.0 is live — next, a public exam on Stanford Terminal-Bench #52

Uh oh!

onestardao Sep 27, 2025 Maintainer

Announcement: WFGY 2.0 is live — next, a public exam on Stanford Terminal-Bench

What’s new in WFGY 2.0

Why Terminal-Bench

How we’ll test (non-invasive wrapper)

What to expect

Quick start (today)

Replies: 0 comments

onestardao
Sep 27, 2025
Maintainer