feat: add Gemma4 support#2224
Conversation
|
/ok to test b3b4d3c |
|
/ok to test 360cb8a |
|
/ok to test 7353904 |
|
/ok to test e90e80c |
|
/ok to test 04fc41c |
|
/ok to test 9d9fd36 |
|
/ok to test 41fe867 |
Addresses CI lint + build failures on PR NVIDIA-NeMo#2224: - tools/config_cli.py minimize --in-place on the 3 recipes flagged by configs-minimize-check (26B-A4B DAPO, E2B-it DAPO, E4B VLM GRPO). - uv lock to sync uv.lock with pyproject after the jQizhang merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
|
/ok to test 4851c9b |
Replace copy-pasted 26B thresholds with values derived from the dapo-gemma4-31b-it-4n8g-fsdp2-automodel-offpolicy run in wandb project nemorl-gemma4-support (chained runs nsrv9mcy..yjc5ti1g, steps 1-404). Observed at step 20: token_mult_prob_error=1.008, gen_kl_error=5e-4, reward=0.23, filtered_reward=-0.23. Tightened tmpe median gate to 1.05, added gen_kl_error<0.002 gate to match Qwen3.5 DAPO, and set reward / filtered_reward step-20 floors at 0.1 / -0.35 (~0.12 headroom). Promote the 31B test out of disabled.txt into nightly.txt under the Gemma 4 functional runs section now that it has a real baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
- Bump 3rdparty/Automodel-workspace/Automodel fb62eb48 -> 79ce7b20 (upstream/main). Picks up NVIDIA-NeMo#1913 "fix(moe): align EP expert weight dtype with activation dtype" plus four additional upstream fixes. - Regenerate uv.lock for the new submodule SHA. - Switch dapo-gemma4-31b-it-4n8g-fsdp2-automodel.yaml and dapo-gemma4-e2b-it-1n8g-fsdp2-automodel.yaml from hard-coded local lustre paths to HuggingFace model IDs (google/gemma-4-31B-it, google/gemma-4-E2B-it) for both policy.model_name and policy.tokenizer.name. - Add vllm_cfg.gpu_memory_utilization=0.5 to the 31B recipe to match the E2B recipe and avoid vLLM OOM at 4n8g. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>
|
/ok to test 17443c0 |
|
Hey @sharonyu-115 thanks for this! Been trying to get Gemma4 working in Nemo-RL/Nemo-Gym with this in progress PR. I think the move to the OpenAIServingRender API might block passing tool call parameters to the vllm server like the Gym examples (grpo_nanov3.yaml for example) do for things like Let me know if I'm wrong about this - I think all you'd need to do is pass those options through when building the Really appreciate your work here, looking forward to this getting merged! |
Hi @saltwick, yes, exactly! Thanks for catching this. I'll update the code to pass those options through. You're also more than welcome to open a PR directly to this branch if you already have a fix ready. Thank you very much! |
|
@sharonyu-115 here's the fix that worked for me - feel free to chop it up as you need! sharonyu-115#7 |
@saltwick Merged. Thank you very much!! |
…er_async.py Signed-off-by: Sam Saltwick <sam@saltwick.com>
fix: pass VLLM serving kwargs through OpenAIServingRender
f1697ff to
eabb65f
Compare
…s not available in vllm 0.19.0
|
Follow up PR to fix for vllm==0.19.0 . Sorry about that! |
fix: remove reasoning_parser from OpenAIServingRender args
What does this PR do ?
Adds Gemma 4 support to NeMo-RL with DAPO and GRPO recipes across dense and MoE variants, plus a VLM recipe.
Issue
List issues that this PR closes:
#2212
Summary of code changes:
OpenAIServingTokenization now receive the render helper as a dependency instead of inheriting the mixin.
Test suites
Usage
New Gemma 4 recipes:
Wandb link (WIP): https://wandb.ai/ys_fishcool-nvidia/nemorl-gemma4-support?nw=nwuserys_fishcool
Before your PR is "Ready for review"
Pre checks:
Additional Information
Know issues: (TO BE FILLED)