Conversation
Adds simplesnake [1] and variants to list of clem games. [1] https://github.com/porterrigby/simplesnake#
|
Hi @bakuzen, do you have baseline results for memory? I cannot imagine that it is not played perfectly by an LLM, since all information is given in the prompt. Is this like a need-in-the-haystack task or do the contacts include some controllable level of ambiguity? |
|
Big models have no problem, but when it gets up to, say 50 contacts medium and smaller models struggle. Likely due to lack of context window size, but there are limits if one wants to use even a good-sized model to fish through information like this. As for controllable ambiguity, we try to set it up where all individuals have a unique note, but we don't explicitly check. |
|
OK, sorry for the late response. But I just now realize that this is the wrong repository. You should merge into clembench. |
Adding 5 games:
The four games show how well an LM can fish out information about a contact given a complete preprompt, but also if a LM can "remember" information during the dialogue outside of the prompt and game history.