Skip to content

fix: fix OOM#2285

Merged
terrykong merged 2 commits intomainfrom
yukih/fix-oom
Apr 18, 2026
Merged

fix: fix OOM#2285
terrykong merged 2 commits intomainfrom
yukih/fix-oom

Conversation

@yuki-97
Copy link
Copy Markdown
Contributor

@yuki-97 yuki-97 commented Apr 18, 2026

fix OOM issue introduced by #2249.
validated grpo-qwen3-30ba3b-8n8g-megatron passed.

Signed-off-by: Yuki Huang <yukih@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 18, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@yuki-97 yuki-97 added the CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) label Apr 18, 2026
@yuki-97
Copy link
Copy Markdown
Contributor Author

yuki-97 commented Apr 18, 2026

/ok to test 645cd78

@yuki-97 yuki-97 marked this pull request as ready for review April 18, 2026 04:42
@yuki-97 yuki-97 requested a review from a team as a code owner April 18, 2026 04:42
@yuki-97 yuki-97 requested a review from terrykong April 18, 2026 04:42
@yuki-97 yuki-97 added the r0.6.0 label Apr 18, 2026
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97
Copy link
Copy Markdown
Contributor Author

yuki-97 commented Apr 18, 2026

/ok to test 44b5a21

Copy link
Copy Markdown
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@terrykong terrykong merged commit 043775c into main Apr 18, 2026
27 checks passed
@terrykong terrykong deleted the yukih/fix-oom branch April 18, 2026 05:29
svcnvidia-nemo-ci pushed a commit that referenced this pull request Apr 18, 2026
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) r0.6.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants