Fix helm_normalizer mis-scoring numbers (homogeneize before remove_punc) by iamsharduld · Pull Request #1273 · huggingface/lighteval

iamsharduld · 2026-06-25T20:01:34Z

What

In helm_normalizer (the quasi-exact-match / HELM normalizer), remove_punc runs before
homogeneize_numbers, so it strips the decimal point before the float() cast. The result is wrong
on numeric answers:

from lighteval.metrics.normalizations import helm_normalizer
from lighteval.metrics.metrics_sample import ExactMatches
em = ExactMatches(normalize_gold=helm_normalizer, normalize_pred=helm_normalizer)

em.compute_one_item("10", "1.0")   # 1  -> distinct numbers scored as EXACT MATCH (false positive)
em.compute_one_item("3.14", "314") # 1  -> false positive
em.compute_one_item("1.0", "1")    # 0  -> false NEGATIVE, contradicting homogeneize_numbers' docstring

Root cause: '1.0' → remove_punc → '10' → float → '10.0', and '3.14' → '314' → '314.0',
so different numbers normalize to the same string while 1.0 and 1 normalize differently.

Fix

Run homogeneize_numbers before remove_punc, so float() sees the original token. Distinct
numbers then stay distinct and equal-but-differently-formatted numbers match, per the docstring intent.
Non-numeric tokens are unaffected (homogeneize_numbers returns them unchanged either way).

Tests

tests/test_unit_base_metrics.py::test_quasi_exact_match_numbers — 1.0 == 1 matches; 10 != 1.0
and 3.14 != 314 don't. Fails before, passes after; the existing test_quasi_exact_match (sentence
text) still passes.

In helm_normalizer, remove_punc ran before homogeneize_numbers, so it stripped the decimal point before the float() cast: '1.0' -> '10' -> '10.0'. This made distinct numbers collide ('10' and '1.0' both -> '10.0'; '3.14' and '314' both -> '314.0') and broke the function's own documented goal ('1.0' != '1'), causing false exact-match scores on numeric answers (QuAC/DROP-style quasi_exact_match). Run homogeneize_numbers before remove_punc so float() sees the original token.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix helm_normalizer mis-scoring numbers (homogeneize before remove_punc)#1273

Fix helm_normalizer mis-scoring numbers (homogeneize before remove_punc)#1273
iamsharduld wants to merge 1 commit into
huggingface:mainfrom
iamsharduld:fix/helm-normalizer-numbers

iamsharduld commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

iamsharduld commented Jun 25, 2026

What

Fix

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant