adding new games to clembench by bakuzen · Pull Request #239 · clp-research/clemcore

bakuzen · 2025-10-09T18:03:58Z

Adding 5 games:

memory - a game where the LM is given contact info for 50 people (first name, last name, work, emial, and note where note is a unique attribute, like hobby) and then is asked about contact info based on the note.
memory_narrative - same game as memory, but all information about contacts is given in the pre-prompt
memory_turns - information about new contacts is given at each dialogue turn. One question about a contact is also asked about a contact given at a prior turn.
memory_narrative_turns - questions about contacts is asked for at each turn, but all information about contacts is given in the preprompt.

The four games show how well an LM can fish out information about a contact given a complete preprompt, but also if a LM can "remember" information during the dialogue outside of the prompt and game history.

simplesnake - a game for evaluating LLMs on 2D text-based spatial reasoning tasks.

Adds simplesnake [1] and variants to list of clem games. [1] https://github.com/porterrigby/simplesnake#

phisad · 2025-10-13T07:56:37Z

Hi @bakuzen, do you have baseline results for memory? I cannot imagine that it is not played perfectly by an LLM, since all information is given in the prompt. Is this like a need-in-the-haystack task or do the contacts include some controllable level of ambiguity?

bakuzen · 2025-10-13T18:05:50Z

Big models have no problem, but when it gets up to, say 50 contacts medium and smaller models struggle. Likely due to lack of context window size, but there are limits if one wants to use even a good-sized model to fish through information like this. As for controllable ambiguity, we try to set it up where all individuals have a unique note, but we don't explicitly check.

phisad · 2025-11-14T13:49:56Z

OK, sorry for the late response. But I just now realize that this is the wrong repository. You should merge into clembench.

bakuzen and others added 7 commits January 7, 2025 19:42

adding three of the four memory games

fce59ec

added mermory narative turns game, tested on multiple LMs

ce618f2

added mermory narative turns game, tested on multiple LMs

1a6c9e8

new game askmissing

9b15709

changes to model registry from origin

e1f24ee

Adds and updates simplesnake for Clembench 2.0

5aa79e4

Merge pull request #1 from bsu-slim/simplesnake

a68e578

Adds simplesnake [1] and variants to list of clem games. [1] https://github.com/porterrigby/simplesnake#

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding new games to clembench#239

adding new games to clembench#239
bakuzen wants to merge 7 commits intoclp-research:mainfrom
bsu-slim:main

bakuzen commented Oct 9, 2025

Uh oh!

phisad commented Oct 13, 2025

Uh oh!

bakuzen commented Oct 13, 2025

Uh oh!

phisad commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bakuzen commented Oct 9, 2025

Uh oh!

phisad commented Oct 13, 2025

Uh oh!

bakuzen commented Oct 13, 2025

Uh oh!

phisad commented Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants