feat: add Gemma4 support by sharonyu-115 · Pull Request #2224 · NVIDIA-NeMo/RL

sharonyu-115 · 2026-04-07T10:25:06Z

What does this PR do ?

Adds Gemma 4 support to NeMo-RL with DAPO and GRPO recipes across dense and MoE variants, plus a VLM recipe.

Issue

List issues that this PR closes:
#2212

Summary of code changes:

pyproject.toml — transformers 5.3.0 → 5.5.0; vLLM 0.17.1 → 0.19.0; add mistral-common>=1.11.0
3rdparty/Automodel-workspace/Automodel — bumped to upstream main to pick up the fixed needed by Gemma4
nemo_rl/models/policy/utils.py — register gemma4 in both AUTOMODEL_FACTORY dicts.
nemo_rl/models/automodel/train.py — add _needs_kv_cache_for_shared_layers(model) helper and thread a new use_cache kwarg through model_forward. Gemma 4 E2B has num_kv_shared_layers > 0 and needs use_cache=True so shared layers can pull K/V from anchor layers via DynamicCache (TODO noted to remove once transformers ≥ 5.5.2 lands.); also injects mm_token_type_ids for text-only inputs.
nemo_rl/models/policy/workers/dtensor_policy_worker.py — same mm_token_type_ids zero-tensor injection at all four forward-pass call sites on the DTensor path.
nemo_rl/models/automodel/setup.py — remove hardcoded visual-encoder freeze. Replaced by Automodel's declarative freeze_config in recipe YAMLs (fixes ckpt-resume key mismatch issues). Updated the existing Qwen3.5 yaml files which are affected by this change.
nemo_rl/models/generation/vllm/vllm_worker.py — extend the skip_tokenizer_init / chat-template workaround list to include Gemma4ForConditionalGeneration.
nemo_rl/models/generation/vllm/vllm_worker_async.py — adapt the async HTTP server to the vLLM 0.19.0 API: shared _preprocess_chat mixin replaced by a NeMoRLOpenAIServingRender subclass, and OpenAIServingChat /
OpenAIServingTokenization now receive the render helper as a dependency instead of inheriting the mixin.

Test suites

nightly.txt — E2B-it DAPO + E4B VLM GRPO + 31B-it DAPO (functional).
release.txt — 26B-A4B-it DAPO (convergence; baseline wandb kouzgkf3).

Usage

New Gemma 4 recipes:

examples/configs/recipes/llm/dapo-gemma4-31b-it-4n8g-fsdp2-automodel.yaml — dense 31B-it DAPO.
examples/configs/recipes/llm/dapo-gemma4-26ba4b-it-4n8g-fsdp2-automodel.yaml — MoE 26B-A4B-it DAPO.
examples/configs/recipes/llm/dapo-gemma4-e2b-it-1n8g-fsdp2-automodel.yaml — E2B-it DAPO.
examples/configs/recipes/vlm/vlm_grpo-gemma4-e4b-geo3k-1n8g-automodel.yaml — E4B VLM GRPO on geometry3k.

Wandb link (WIP): https://wandb.ai/ys_fishcool-nvidia/nemorl-gemma4-support?nw=nwuserys_fishcool

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

Know issues: (TO BE FILLED)

copy-pr-bot · 2026-04-07T10:25:10Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

sharonyu-115 · 2026-04-07T10:27:11Z

/ok to test b3b4d3c

zpqiu · 2026-04-08T05:43:00Z

/ok to test 360cb8a

sharonyu-115 · 2026-04-08T07:25:26Z

/ok to test 7353904

sharonyu-115 · 2026-04-08T14:43:08Z

/ok to test e90e80c

sharonyu-115 · 2026-04-09T08:36:50Z

/ok to test 04fc41c

sharonyu-115 · 2026-04-09T14:59:18Z

/ok to test 9d9fd36

github-actions · 2026-04-21T02:52:16Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 41fe867 (PR #2224 from gemma4-support)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

sharonyu-115 · 2026-04-21T03:18:12Z

/ok to test 41fe867

Addresses CI lint + build failures on PR NVIDIA-NeMo#2224: - tools/config_cli.py minimize --in-place on the 3 recipes flagged by configs-minimize-check (26B-A4B DAPO, E2B-it DAPO, E4B VLM GRPO). - uv lock to sync uv.lock with pyproject after the jQizhang merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

sharonyu-115 · 2026-04-21T05:50:26Z

/ok to test 4851c9b

github-actions · 2026-04-21T05:50:49Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 4851c9b (PR #2224 from gemma4-support)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

Replace copy-pasted 26B thresholds with values derived from the dapo-gemma4-31b-it-4n8g-fsdp2-automodel-offpolicy run in wandb project nemorl-gemma4-support (chained runs nsrv9mcy..yjc5ti1g, steps 1-404). Observed at step 20: token_mult_prob_error=1.008, gen_kl_error=5e-4, reward=0.23, filtered_reward=-0.23. Tightened tmpe median gate to 1.05, added gen_kl_error<0.002 gate to match Qwen3.5 DAPO, and set reward / filtered_reward step-20 floors at 0.1 / -0.35 (~0.12 headroom). Promote the 31B test out of disabled.txt into nightly.txt under the Gemma 4 functional runs section now that it has a real baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

github-actions · 2026-04-21T15:05:21Z

✅ Submodule Fast-Forward Check Results

Check based on commit: 9c82da6 (PR #2224 from gemma4-support)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

- Bump 3rdparty/Automodel-workspace/Automodel fb62eb48 -> 79ce7b20 (upstream/main). Picks up NVIDIA-NeMo#1913 "fix(moe): align EP expert weight dtype with activation dtype" plus four additional upstream fixes. - Regenerate uv.lock for the new submodule SHA. - Switch dapo-gemma4-31b-it-4n8g-fsdp2-automodel.yaml and dapo-gemma4-e2b-it-1n8g-fsdp2-automodel.yaml from hard-coded local lustre paths to HuggingFace model IDs (google/gemma-4-31B-it, google/gemma-4-E2B-it) for both policy.model_name and policy.tokenizer.name. - Add vllm_cfg.gpu_memory_utilization=0.5 to the 31B recipe to match the E2B recipe and avoid vLLM OOM at 4n8g. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

sharonyu-115 · 2026-04-22T03:39:05Z

/ok to test 17443c0

saltwick · 2026-04-27T15:10:49Z

Hey @sharonyu-115 thanks for this! Been trying to get Gemma4 working in Nemo-RL/Nemo-Gym with this in progress PR.

I think the move to the OpenAIServingRender API might block passing tool call parameters to the vllm server like the Gym examples (grpo_nanov3.yaml for example) do for things like enable_auto_tools=True and tool_parser="gemma4".

Let me know if I'm wrong about this - I think all you'd need to do is pass those options through when building the openai_serving_render object.

Really appreciate your work here, looking forward to this getting merged!

sharonyu-115 · 2026-04-27T16:10:16Z

Hey @sharonyu-115 thanks for this! Been trying to get Gemma4 working in Nemo-RL/Nemo-Gym with this in progress PR.

I think the move to the OpenAIServingRender API might block passing tool call parameters to the vllm server like the Gym examples (grpo_nanov3.yaml for example) do for things like enable_auto_tools=True and tool_parser="gemma4".

Let me know if I'm wrong about this - I think all you'd need to do is pass those options through when building the openai_serving_render object.

Really appreciate your work here, looking forward to this getting merged!

Hi @saltwick, yes, exactly! Thanks for catching this. I'll update the code to pass those options through. You're also more than welcome to open a PR directly to this branch if you already have a fix ready. Thank you very much!

saltwick · 2026-04-27T16:45:19Z

@sharonyu-115 here's the fix that worked for me - feel free to chop it up as you need! sharonyu-115#7

sharonyu-115 · 2026-04-28T01:40:13Z

@sharonyu-115 here's the fix that worked for me - feel free to chop it up as you need! sharonyu-115#7

@saltwick Merged. Thank you very much!!

…er_async.py Signed-off-by: Sam Saltwick <sam@saltwick.com>

fix: pass VLLM serving kwargs through OpenAIServingRender

…s not available in vllm 0.19.0

saltwick · 2026-04-28T18:26:04Z

Follow up PR to fix for vllm==0.19.0 . Sorry about that!

fix: remove reasoning_parser from OpenAIServingRender args

sharonyu-115 added the CI:L1 Run doctests, unit tests, and functional tests label Apr 8, 2026

zpqiu changed the title ~~Gemma4 support~~ feat: add Gemma4 support Apr 8, 2026

zpqiu added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Apr 8, 2026

sharonyu-115 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Apr 8, 2026

zpqiu marked this pull request as ready for review April 8, 2026 05:36

zpqiu requested review from a team as code owners April 8, 2026 05:36

zpqiu added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Apr 8, 2026

zpqiu marked this pull request as draft April 8, 2026 05:37

copy-pr-bot Bot had a problem deploying to nemo-ci April 8, 2026 05:43 Failure

copy-pr-bot Bot had a problem deploying to nemo-ci April 8, 2026 07:25 Failure

copy-pr-bot Bot temporarily deployed to nemo-ci April 8, 2026 14:44 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 8, 2026 15:59 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 9, 2026 08:37 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 9, 2026 10:01 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci April 21, 2026 03:18 Failure

copy-pr-bot Bot temporarily deployed to nemo-ci April 21, 2026 05:50 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 21, 2026 07:39 Inactive

sharonyu-115 assigned sharonyu-115 and jQizhang Apr 21, 2026

sharonyu-115 marked this pull request as ready for review April 21, 2026 15:09

sharonyu-115 requested review from a team as code owners April 21, 2026 15:09

copy-pr-bot Bot temporarily deployed to nemo-ci April 22, 2026 03:39 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci April 22, 2026 05:34 Inactive

sharonyu-115 requested a review from terrykong April 24, 2026 14:55

saltwick and others added 2 commits April 28, 2026 09:07

fix: pass http_server_kwargs through OpenAIServingRender in vllm_work…

c2d6a82

…er_async.py Signed-off-by: Sam Saltwick <sam@saltwick.com>

Merge pull request #7 from saltwick/fix/openai-serving-render-kwargs

eabb65f

fix: pass VLLM serving kwargs through OpenAIServingRender

sharonyu-115 force-pushed the gemma4-support branch from f1697ff to eabb65f Compare April 28, 2026 13:43

fix: remove reasoning_parser from OpenAIServingRender args since it i…

3e7702a

…s not available in vllm 0.19.0

Merge pull request #8 from saltwick/fix/openai-render-kwargs-vllm019

262b3a9

fix: remove reasoning_parser from OpenAIServingRender args

Conversation

sharonyu-115 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issue

Usage

Before your PR is "Ready for review"

Additional Information

Uh oh!

copy-pr-bot Bot commented Apr 7, 2026

Uh oh!

sharonyu-115 commented Apr 7, 2026

Uh oh!

zpqiu commented Apr 8, 2026

Uh oh!

sharonyu-115 commented Apr 8, 2026

Uh oh!

sharonyu-115 commented Apr 8, 2026

Uh oh!

sharonyu-115 commented Apr 9, 2026

Uh oh!

sharonyu-115 commented Apr 9, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

sharonyu-115 commented Apr 21, 2026

Uh oh!

sharonyu-115 commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

github-actions Bot commented Apr 21, 2026

✅ Submodule Fast-Forward Check Results

✅ Submodules that are properly updated:

Uh oh!

sharonyu-115 commented Apr 22, 2026

Uh oh!

saltwick commented Apr 27, 2026

Uh oh!

sharonyu-115 commented Apr 27, 2026

Uh oh!

saltwick commented Apr 27, 2026

Uh oh!

sharonyu-115 commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

saltwick commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sharonyu-115 commented Apr 7, 2026 •

edited

Loading

sharonyu-115 commented Apr 28, 2026 •

edited

Loading