Skip to content

Fix unhandled exception noise from background safetensors conversion thread#45752

Open
dhruv7477 wants to merge 1 commit intohuggingface:mainfrom
dhruv7477:fix/background-thread-conversion-error
Open

Fix unhandled exception noise from background safetensors conversion thread#45752
dhruv7477 wants to merge 1 commit intohuggingface:mainfrom
dhruv7477:fix/background-thread-conversion-error

Conversation

@dhruv7477
Copy link
Copy Markdown

The background Thread-auto_conversion in modeling_utils.py was spawned with ignore_errors_during_conversion=False. When get_repo_discussions() raises HfHubHTTPError 403 (discussions disabled on the repo), the exception propagated uncaught inside the thread, and Python printed the full traceback to stderr — the noise reported in #44403.

Since this thread is explicitly fire-and-forget (the comment on line 720 reads "try to launch safetensors conversion for next time"), errors from it should never surface to the user. Changing the flag to True causes auto_conversion to catch and suppress the exception cleanly.

A prior attempt (#44440) was closed because it was opened two days after the issue while discussion was still ongoing, and made additional changes to safetensors_conversion.py. This PR is a single-line fix to the actual root cause.

Fixes #44403


Tests run: python -m pytest tests/utils/test_modeling_utils.py -x -v
Result: 94 passed, 38 skipped, 1 xfailed

I used an AI assistant to help trace the root cause and identify the relevant code path, but I reviewed the change, ran the tests, and verified the fix against the stack trace in the issue.

…thread

Signed-off-by: Dhruv Sharma <dhruv7477@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45752&sha=ac7fda

@dhruv7477
Copy link
Copy Markdown
Author

Re: Failing CI checks

The two failing tests (test_tp_generation for exaone4 and test_ep_forward for gpt_oss) are pre-existing failures unrelated to this PR.

GptOssModelTest::test_ep_forward — tracked by open issue #45161 ("Only TP not working with GPT-OSS MoE model", filed April 1, 2026). The ProcessRaisedException in EP forward is a known infrastructure issue.

Exaone4ModelTest::test_tp_generation — SIGABRT (process crash) in the distributed subprocess. This is a CUDA/NCCL-level crash that cannot be caused by changing the ignore_errors_during_conversion flag on a background Python thread in modeling_utils.py.

This PR touches a single line in the background safetensors conversion thread. It has no interaction with tensor-parallel or expert-parallel computation paths. Could a maintainer re-run the required run_tests check?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unnecessary noise when loading a transformer

1 participant