Fix OOM in CI by reducing image size of tiny Gemma3 model#5680
Open
albertvillanova wants to merge 1 commit intomainfrom
Open
Fix OOM in CI by reducing image size of tiny Gemma3 model#5680albertvillanova wants to merge 1 commit intomainfrom
albertvillanova wants to merge 1 commit intomainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Member
|
thanks! can you just confirm with a forward + gpu peak memory measurement for old vs new? |
Member
Author
|
Thanks for your sensible suggestion: the difference is small. I continue investigating... |
Member
Author
|
I think the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix OOM in CI by reducing image size of tiny Gemma3 model.
This PR introduces a targeted adjustment for the
google/gemma-3-4b-itmodel in thegenerate_tiny_modelsscript to address memory usage issues related to image processing.Partial fix for:
Motivation
The
tiny-Gemma3ForConditionalGenerationmodel was generated with the default SigLIP image size of 896×896, which produces 4,096 patches per image. During training, the vision encoder attention maps have shape[batch, heads, 4096, 4096], consuming ~1 GB per layer. With 2 vision layers and backpropagation, a single Gemma3 test consumes 5–7 GiB of GPU memory. Two such tests running concurrently on a 14.74 GiB GPU caused CUDA out-of-memory errors in all other parallel workers.Solution
Override
image_size=224(256 patches) when generating the tiny Gemma3 model. This is consistent withmm_tokens_per_image=256in the Gemma3 config: the projector'sAvgPool2dgetskernel_size=1(identity), which is architecturally valid. The processor's image processor size is updated to match so that test inputs are also resized to 224×224.Changes
Model-specific configuration:
google/gemma-3-4b-itmodel, setsvision_config["image_size"]andprocessor.image_processor.sizeto 224x224 (instead of the default 896x896) to reduce memory consumption during training by limiting the number of image patches and ensuring the projector's average pooling layer acts as an identity function.Note
Low Risk
Model-generation script change scoped to a single model ID; it only adjusts test image resolution/config and shouldn’t affect runtime code paths beyond tiny model artifacts.
Overview
Reduces memory usage for the generated tiny Gemma3 vision-language test model by overriding SigLIP image resolution when
model_id == "google/gemma-3-4b-it".scripts/generate_tiny_models.pynow setsvision_config["image_size"] = 224and alignsprocessor.image_processor.sizeto 224×224, cutting patch count and preventing CI GPU OOMs during Gemma3 training/tests.Reviewed by Cursor Bugbot for commit 15c5aff. Bugbot is set up for automated code reviews on this repo. Configure here.