WFGY 3.0 · TXT-based tension reasoning engine (community test) #72

onestardao · 2026-02-24T04:23:35Z

onestardao
Feb 24, 2026
Maintainer

WFGY 3.0 · TXT-based tension reasoning engine (community test)

WFGY 2.0 grew up in the RAG / infra world. Its 16-problem ProblemMap is already used as a failure-mode language by several external projects and lists (LlamaIndex RAG troubleshooting docs, Harvard MIMS Lab’s ToolUniverse, Rankify, QCRI LLM Lab’s multimodal RAG survey, multiple “Awesome X” lists, etc.). In practice, 2.0 became a shared checklist for “what exactly broke in my pipeline”.

WFGY 3.0 tries to push the same language into a general reasoning engine.

Instead of only naming RAG failure modes, 3.0 is shipped as a single TXT pack wired to 131 S-class questions. You upload the TXT into a strong LLM, type run then go, and from that point on the model enters a dedicated console that treats your question as a point inside this “tension atlas” instead of as a random prompt.

The engine itself is already stable. What is still in flux are the prompts, menu wording, and console UX, and this is where I would really like feedback from people who care about deep reasoning quality.

How to try it (5 minutes)

Download the WFGY 3.0 · Singularity Demo TXT file from this repo.
Upload it into a strong LLM (GPT-class, Claude-class, or a serious open model with long context).
Type run, then go, and follow the built-in menu to pick a mission. Bring one real high-tension question from your life, research, or system, not a toy problem.

If the first run collapses, loops, or feels fake, that is still useful. Please note which model you used, what you asked, and where it broke.

What feedback is most useful right now

GO mode
Does the quick go flow give you a clear sense of what this engine is trying to do at the effective layer, or does it feel gimmicky / confusing?
Console and missions
Are the menu options and mission descriptions readable enough, or too dense / too long? Is there anything you would never click because the wording is unclear?
Behaviour across models
On your model of choice, do the PROMPT_02 / PROMPT_03 / STORY flows feel too heavy, too light, or about right? Are there points where a small wording change would obviously help the model think better?
Atlas feel (the 131 S-class problems)
When the engine references S-class IDs, does it help you navigate (“this feels like a map”), or does it just add noise? If you try to map the same real-world question more than once, does it land in roughly the same region?

You can send feedback as a GitHub Discussion reply, as a GitHub issue with logs / screenshots, or as an external write-up (I am happy to link back). Honest failure reports are more valuable than polite praise.

License and usage

WFGY 3.0 follows the same MIT license as WFGY 2.0. You are free to:

Fork and modify the TXT prompts.
Wrap the engine into your own tools or agents.
Use it as a stress test for your favourite LLM and publish the results.

In return, the only real expectation is that you treat this TXT as a serious candidate for a reasoning engine. If you find places where it clearly fails that bar, please show me where.

https://github.com/onestardao/WFGY

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WFGY 3.0 · TXT-based tension reasoning engine (community test) #72

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

WFGY 3.0 · TXT-based tension reasoning engine (community test) #72

Uh oh!

onestardao Feb 24, 2026 Maintainer