# Announcement: WFGY 2.0 is live — next, a public exam on Stanford Terminal-Bench #52
onestardao
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Announcement: WFGY 2.0 is live — next, a public exam on Stanford Terminal-Bench
TL;DR
WFGY Core 2.0 (a 7-step symbolic reasoning engine) is now live.
Next, we’ll run it through the Stanford Terminal-Bench (TB) public exam.
All model calls will be wrapped by ΔS drift control → Coupler/BBPF → BBAM → DT guards.
Leaderboard results and reproducible scripts will be released once available.
What’s new in WFGY 2.0
Reference: WFGY Core 2.0, Public PDF (math layer).
Why Terminal-Bench
Terminal-Bench tests LLMs under real terminal-style workflows, stressing multi-step reasoning, robustness, and tool use.
It naturally exposes semantic drift, collapse, and recovery — the exact failure points WFGY was built to fix.
How we’ll test (non-invasive wrapper)
We do not modify TB. Instead, we wrap the model calls:
We will publish: minimal scripts, environment manifest, hashed logs, and score tables.
Preview: Terminal-Bench teaser.
What to expect
Until leaderboard results are posted, the dedicated TB repo stays “Coming Soon” to protect artifacts.
Quick start (today)
—
PSBigBig • WFGY
Beta Was this translation helpful? Give feedback.
All reactions