Skip to content

Fix MajAtN IndexError when the gold is not the first choice#1274

Open
iamsharduld wants to merge 1 commit into
huggingface:mainfrom
iamsharduld:fix/majatn-gold-index
Open

Fix MajAtN IndexError when the gold is not the first choice#1274
iamsharduld wants to merge 1 commit into
huggingface:mainfrom
iamsharduld:fix/majatn-gold-index

Conversation

@iamsharduld

Copy link
Copy Markdown

What

MajAtN.compute builds the scoring Doc from the preprocessed golds:

processed_choices = [self.preprocess(text=g) for g in doc.get_golds()]   # length == #golds
new_doc = Doc(choices=processed_choices, query=doc.query, gold_index=doc.gold_index)

but keeps the original gold_index. When the gold is not the first choice, new_doc.get_golds()
indexes choices[gold_index] beyond the gold-only list and raises IndexError:

doc = Doc(query="q", choices=["London", "Paris", "Berlin"], gold_index=[1], task_name="test")
MajAtN(n=3).compute(doc, ModelResponse(text=["Paris", "Paris", "London"]))  # IndexError
# PassAtK on the same doc returns a correct score

Fix

Mirror PassAtK / GPassAtK: preprocess doc.choices and keep gold_index, so new_doc is
internally consistent for any gold_index.

Tests

tests/test_unit_base_metrics.py::test_maj_at_n_non_first_gold — majority-correct → 1, majority-wrong
→ 0, with gold_index=[1]. Fails before (IndexError), passes after.

MajAtN.compute set new_doc.choices = [preprocess(g) for g in doc.get_golds()]
(length = number of golds) but kept gold_index = doc.gold_index. When the gold
is not the first choice (e.g. gold_index=[1] or 2), new_doc.get_golds() indexes
beyond the gold-only choices list and raises IndexError. Mirror PassAtK/GPassAtK:
preprocess doc.choices and keep gold_index, so new_doc stays internally consistent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant