Skip to content

feat: add Gemma4 support#2224

Open
sharonyu-115 wants to merge 30 commits intoNVIDIA-NeMo:mainfrom
sharonyu-115:gemma4-support
Open

feat: add Gemma4 support#2224
sharonyu-115 wants to merge 30 commits intoNVIDIA-NeMo:mainfrom
sharonyu-115:gemma4-support

Conversation

@sharonyu-115
Copy link
Copy Markdown
Contributor

@sharonyu-115 sharonyu-115 commented Apr 7, 2026

What does this PR do ?

Adds Gemma 4 support to NeMo-RL with DAPO and GRPO recipes across dense and MoE variants, plus a VLM recipe.

Issue

List issues that this PR closes:
#2212

Summary of code changes:

  • pyproject.toml — transformers 5.3.0 → 5.5.0; vLLM 0.17.1 → 0.19.0; add mistral-common>=1.11.0
  • 3rdparty/Automodel-workspace/Automodel — bumped to upstream main to pick up the fixed needed by Gemma4
  • nemo_rl/models/policy/utils.py — register gemma4 in both AUTOMODEL_FACTORY dicts.
  • nemo_rl/models/automodel/train.py — add _needs_kv_cache_for_shared_layers(model) helper and thread a new use_cache kwarg through model_forward. Gemma 4 E2B has num_kv_shared_layers > 0 and needs use_cache=True so shared layers can pull K/V from anchor layers via DynamicCache (TODO noted to remove once transformers ≥ 5.5.2 lands.); also injects mm_token_type_ids for text-only inputs.
  • nemo_rl/models/policy/workers/dtensor_policy_worker.py — same mm_token_type_ids zero-tensor injection at all four forward-pass call sites on the DTensor path.
  • nemo_rl/models/automodel/setup.py — remove hardcoded visual-encoder freeze. Replaced by Automodel's declarative freeze_config in recipe YAMLs (fixes ckpt-resume key mismatch issues). Updated the existing Qwen3.5 yaml files which are affected by this change.
  • nemo_rl/models/generation/vllm/vllm_worker.py — extend the skip_tokenizer_init / chat-template workaround list to include Gemma4ForConditionalGeneration.
  • nemo_rl/models/generation/vllm/vllm_worker_async.py — adapt the async HTTP server to the vLLM 0.19.0 API: shared _preprocess_chat mixin replaced by a NeMoRLOpenAIServingRender subclass, and OpenAIServingChat /
    OpenAIServingTokenization now receive the render helper as a dependency instead of inheriting the mixin.

Test suites

  • nightly.txt — E2B-it DAPO + E4B VLM GRPO + 31B-it DAPO (functional).
  • release.txt — 26B-A4B-it DAPO (convergence; baseline wandb kouzgkf3).

Usage

New Gemma 4 recipes:

  • examples/configs/recipes/llm/dapo-gemma4-31b-it-4n8g-fsdp2-automodel.yaml — dense 31B-it DAPO.
  • examples/configs/recipes/llm/dapo-gemma4-26ba4b-it-4n8g-fsdp2-automodel.yaml — MoE 26B-A4B-it DAPO.
  • examples/configs/recipes/llm/dapo-gemma4-e2b-it-1n8g-fsdp2-automodel.yaml — E2B-it DAPO.
  • examples/configs/recipes/vlm/vlm_grpo-gemma4-e4b-geo3k-1n8g-automodel.yaml — E4B VLM GRPO on geometry3k.

Wandb link (WIP): https://wandb.ai/ys_fishcool-nvidia/nemorl-gemma4-support?nw=nwuserys_fishcool

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

Know issues: (TO BE FILLED)

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 7, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@sharonyu-115
Copy link
Copy Markdown
Contributor Author

/ok to test b3b4d3c

@sharonyu-115 sharonyu-115 added the CI:L1 Run doctests, unit tests, and functional tests label Apr 8, 2026
@zpqiu zpqiu changed the title Gemma4 support feat: add Gemma4 support Apr 8, 2026
@zpqiu zpqiu added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Apr 8, 2026
@sharonyu-115 sharonyu-115 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Apr 8, 2026
@zpqiu zpqiu marked this pull request as ready for review April 8, 2026 05:36
@zpqiu zpqiu requested review from a team as code owners April 8, 2026 05:36
@zpqiu zpqiu added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Apr 8, 2026
@zpqiu zpqiu marked this pull request as draft April 8, 2026 05:37
@zpqiu
Copy link
Copy Markdown
Contributor

zpqiu commented Apr 8, 2026

/ok to test 360cb8a

@sharonyu-115
Copy link
Copy Markdown
Contributor Author

/ok to test 7353904

@sharonyu-115
Copy link
Copy Markdown
Contributor Author

/ok to test e90e80c

@sharonyu-115
Copy link
Copy Markdown
Contributor Author

/ok to test 04fc41c

@sharonyu-115
Copy link
Copy Markdown
Contributor Author

/ok to test 9d9fd36

@github-actions
Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: 41fe867 (PR #2224 from gemma4-support)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@sharonyu-115
Copy link
Copy Markdown
Contributor Author

/ok to test 41fe867

Addresses CI lint + build failures on PR NVIDIA-NeMo#2224:
- tools/config_cli.py minimize --in-place on the 3 recipes flagged by
  configs-minimize-check (26B-A4B DAPO, E2B-it DAPO, E4B VLM GRPO).
- uv lock to sync uv.lock with pyproject after the jQizhang merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Shuang Yu <shuangy@nvidia.com>
@sharonyu-115
Copy link
Copy Markdown
Contributor Author

/ok to test 4851c9b

@github-actions
Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: 4851c9b (PR #2224 from gemma4-support)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

Replace copy-pasted 26B thresholds with values derived from the
dapo-gemma4-31b-it-4n8g-fsdp2-automodel-offpolicy run in wandb project
nemorl-gemma4-support (chained runs nsrv9mcy..yjc5ti1g, steps 1-404).
Observed at step 20: token_mult_prob_error=1.008, gen_kl_error=5e-4,
reward=0.23, filtered_reward=-0.23. Tightened tmpe median gate to 1.05,
added gen_kl_error<0.002 gate to match Qwen3.5 DAPO, and set reward /
filtered_reward step-20 floors at 0.1 / -0.35 (~0.12 headroom).

Promote the 31B test out of disabled.txt into nightly.txt under the
Gemma 4 functional runs section now that it has a real baseline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Shuang Yu <shuangy@nvidia.com>
@github-actions
Copy link
Copy Markdown

✅ Submodule Fast-Forward Check Results

Check based on commit: 9c82da6 (PR #2224 from gemma4-support)

✅ Submodules that are properly updated:

Automodel: ✅ PR branch is ahead of main branch (fast-forward)

All submodule changes look good! ✨

@sharonyu-115 sharonyu-115 marked this pull request as ready for review April 21, 2026 15:09
@sharonyu-115 sharonyu-115 requested review from a team as code owners April 21, 2026 15:09
- Bump 3rdparty/Automodel-workspace/Automodel fb62eb48 -> 79ce7b20
  (upstream/main). Picks up NVIDIA-NeMo#1913 "fix(moe): align EP expert weight
  dtype with activation dtype" plus four additional upstream fixes.
- Regenerate uv.lock for the new submodule SHA.
- Switch dapo-gemma4-31b-it-4n8g-fsdp2-automodel.yaml and
  dapo-gemma4-e2b-it-1n8g-fsdp2-automodel.yaml from hard-coded
  local lustre paths to HuggingFace model IDs (google/gemma-4-31B-it,
  google/gemma-4-E2B-it) for both policy.model_name and
  policy.tokenizer.name.
- Add vllm_cfg.gpu_memory_utilization=0.5 to the 31B recipe to
  match the E2B recipe and avoid vLLM OOM at 4n8g.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Shuang Yu <shuangy@nvidia.com>
@sharonyu-115
Copy link
Copy Markdown
Contributor Author

/ok to test 17443c0

@saltwick
Copy link
Copy Markdown

Hey @sharonyu-115 thanks for this! Been trying to get Gemma4 working in Nemo-RL/Nemo-Gym with this in progress PR.

I think the move to the OpenAIServingRender API might block passing tool call parameters to the vllm server like the Gym examples (grpo_nanov3.yaml for example) do for things like enable_auto_tools=True and tool_parser="gemma4".

Let me know if I'm wrong about this - I think all you'd need to do is pass those options through when building the openai_serving_render object.

Really appreciate your work here, looking forward to this getting merged!

@sharonyu-115
Copy link
Copy Markdown
Contributor Author

Hey @sharonyu-115 thanks for this! Been trying to get Gemma4 working in Nemo-RL/Nemo-Gym with this in progress PR.

I think the move to the OpenAIServingRender API might block passing tool call parameters to the vllm server like the Gym examples (grpo_nanov3.yaml for example) do for things like enable_auto_tools=True and tool_parser="gemma4".

Let me know if I'm wrong about this - I think all you'd need to do is pass those options through when building the openai_serving_render object.

Really appreciate your work here, looking forward to this getting merged!

Hi @saltwick, yes, exactly! Thanks for catching this. I'll update the code to pass those options through. You're also more than welcome to open a PR directly to this branch if you already have a fix ready. Thank you very much!

@saltwick
Copy link
Copy Markdown

@sharonyu-115 here's the fix that worked for me - feel free to chop it up as you need! sharonyu-115#7

@sharonyu-115
Copy link
Copy Markdown
Contributor Author

sharonyu-115 commented Apr 28, 2026

@sharonyu-115 here's the fix that worked for me - feel free to chop it up as you need! sharonyu-115#7

@saltwick Merged. Thank you very much!!

saltwick and others added 2 commits April 28, 2026 09:07
…er_async.py

Signed-off-by: Sam Saltwick <sam@saltwick.com>
fix: pass VLLM serving kwargs through OpenAIServingRender
@saltwick
Copy link
Copy Markdown

Follow up PR to fix for vllm==0.19.0 . Sorry about that!

fix: remove reasoning_parser from OpenAIServingRender args
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants