Add Gemma 3n model to KerasHub by laxmareddyp · Pull Request #2606 · keras-team/keras-hub

laxmareddyp · 2026-02-19T22:15:38Z

Description of the change

This PR completes the implementation of the Gemma 3n model, building upon the foundations laid in #2404.
It introduces critical architectural features, ensures numerical accuracy against the reference implementation, and streamlines the codebase for production readiness.

Note: Special thanks to @harshaljanjani for the initial work and foundations laid in #2404.

Key Changes & Improvements:

KV Sharing Implementation:

Integrated Key-Value (KV) sharing to optimize memory usage and inference efficiency, aligning with the Gemma 3 architecture.

Causal Masking:

Implemented proper causal masking logic to ensure the model correctly handles autoregressive sequences.

Numerical Parity Fixes:

Identified and resolved several sources of numerical divergence. The implementation now achieves accpetable numerical parity with the original Hugging Face weights.

Code Refactoring & Cleanup:

Removed redundant logic and consolidated overlapping code paths from the previous draft.
Deleted unnecessary files to keep the keras-hub directory clean and maintainable.

Relationship to Previous Work:

This PR supersedes Add Gemma 3n to KerasHub #2404. All valid changes from that PR have been incorporated here, along with the necessary fixes mentioned above. Once this is merged, Add Gemma 3n to KerasHub #2404 can be closed.

Reference

Colab Notebook

Numerical Verification Results:

Text-only Validation

Predicted tokens: ✅ 100% match (35/35 positions)
Mean absolute difference: 0.00045175
Elements within 1e-3: 99.73%

Multimodal Validation (text + image + audio):

Predicted tokens: ✅ 100% match (460/460 positions)
Mean absolute difference: 0.00082702
Per-modality breakdown:
Text (14 positions): mean=0.00126, max=0.01021
Vision (257 positions): mean=0.00072, max=0.00865
Audio (189 positions): mean=0.00094, max=0.02665

Note on tolerance

Gemma3n has a uniquely deep architecture — 30 decoder layers with AltUp (4-way prediction/correction), Laurel blocks, and per-layer input gating.
Cross-framework float32 rounding differences (JAX/XLA vs PyTorch) accumulate ~5.6e-06 per layer, compounding to ~4.5e-04 at the logit level.
Layer-by-layer debugging confirmed that input embeddings match perfectly (0.00 diff) and error grows linearly through the decoder stack — there is no implementation bug.
At atol=1e-3, 99.7% match, At atol=1e-4, approximately 70% of logit elements match.
The 100% token prediction match at every position confirms the conversion is functionally correct.

Note on Parameter Count Mismatch:

The KerasHub Gemma3n backbone has 5,439,595,456 parameters, while the HF model has 5,439,438,272 (a difference of +157,184).
This is expected. The difference comes entirely from the pre-existing MobileNetV5 implementation in KerasHub, which sets use_bias=True for its rms_norm convolutions.
The HF/timm implementation does not use bias here.
We deliberately left MobileNetV5 untouched to preserve its existing Kaggle preset and avoid breaking downstream models.
The extra bias weights are initialized to zero during conversion and do not affect the output (as proven by the 100% token match in multimodal validation).

At atol=1e-3:

At atol=1e-4:

Checklist

I have added all the necessary unit tests for my change.
I have verified that my change does not break existing code and works with all backends (TensorFlow, JAX, and PyTorch).
My PR is based on the latest changes of the main branch (if unsure, rebase the code).
I have followed the Keras Hub Model contribution guidelines in making these changes.
I have followed the Keras Hub API design guidelines in making these changes.
I have signed the Contributor License Agreement.

…deviations

gemini-code-assist · 2026-02-19T22:16:36Z

Summary of Changes

Hello @laxmareddyp, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request finalizes the integration of the Gemma 3n model into KerasHub, building upon previous foundational work. It introduces crucial architectural elements for multimodal processing, ensuring numerical accuracy against reference implementations and streamlining the overall code for production readiness. The changes enable the model to effectively handle text, image, and audio inputs with optimized performance.

Highlights

KV Sharing Implementation: Integrated Key-Value (KV) sharing to optimize memory usage and inference efficiency, aligning with the Gemma 3 architecture.
Causal Masking: Implemented proper causal masking logic to ensure the model correctly handles autoregressive sequences.
Numerical Parity Fixes: Identified and resolved several sources of numerical divergence, achieving acceptable numerical parity with the original Hugging Face weights.
Code Refactoring & Cleanup: Removed redundant logic, consolidated overlapping code paths, and deleted unnecessary files for a cleaner and more maintainable codebase.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

keras_hub/api/layers/init.py
- Added imports for Gemma3nAudioConverter and Gemma3nImageConverter.
keras_hub/api/models/init.py
- Added imports for Gemma3nBackbone, Gemma3nCausalLM, Gemma3nCausalLMPreprocessor, and Gemma3nTokenizer.
keras_hub/api/tokenizers/init.py
- Added import for Gemma3nTokenizer.
keras_hub/src/models/gemma3n/gemma3n_attention.py
- Added Gemma3nAudioRelativePositionEmbedding for audio sequence position embeddings.
- Added Gemma3nTextAttention for multi-head attention in text sequences.
- Added Gemma3nAudioAttention for chunk-based audio sequence attention.
keras_hub/src/models/gemma3n/gemma3n_audio_converter.py
- Added Gemma3nAudioConverter class for converting raw audio waveforms into log-mel spectrograms.
keras_hub/src/models/gemma3n/gemma3n_audio_converter_test.py
- Added tests for Gemma3nAudioConverter, covering output shape, padding, and normalization.
keras_hub/src/models/gemma3n/gemma3n_audio_encoder.py
- Added Gemma3nAudioSubSampleConvProjection for subsampling audio features.
- Added Gemma3nAudioConformerBlock for conformer architecture in audio processing.
- Added Gemma3nAudioEncoder as the main audio encoder for the Gemma3n model.
keras_hub/src/models/gemma3n/gemma3n_audio_layers.py
- Added Gemma3nAudioCumulativeGroupNorm for cumulative group normalization of audio features.
- Added Gemma3nAudioSSCPConvBlock for spectrogram sub-sampling convolutional preprocessing.
- Added Gemma3nAudioConformerFeedForward for feed-forward module in Conformer blocks.
- Added Gemma3nAudioConformerAttention for multi-head self-attention in Conformer blocks.
keras_hub/src/models/gemma3n/gemma3n_backbone.py
- Added Gemma3nMultimodalEmbedder for handling multimodal embeddings.
- Added Gemma3nMultimodalEmbeddingProcessor for interleaving text, vision, and audio embeddings.
- Added Gemma3nBackbone as the multimodal transformer backbone for the Gemma3n model.
keras_hub/src/models/gemma3n/gemma3n_backbone_test.py
- Added tests for Gemma3nBackbone, covering multimodal and text-only configurations, and model saving.
keras_hub/src/models/gemma3n/gemma3n_causal_lm.py
- Added Gemma3nCausalLM class for end-to-end multimodal causal language modeling.
keras_hub/src/models/gemma3n/gemma3n_causal_lm_preprocessor.py
- Added Gemma3nCausalLMPreprocessor for multimodal input preprocessing for Gemma3nCausalLM.
keras_hub/src/models/gemma3n/gemma3n_causal_lm_preprocessor_test.py
- Added tests for Gemma3nCausalLMPreprocessor, including text-only, vision, audio, and multimodal scenarios.
keras_hub/src/models/gemma3n/gemma3n_causal_lm_test.py
- Added tests for Gemma3nCausalLM, covering basic functionality, flash attention, early stopping, and multimodal generation.
keras_hub/src/models/gemma3n/gemma3n_image_converter.py
- Added Gemma3nImageConverter class for image preprocessing in Gemma3n models.
keras_hub/src/models/gemma3n/gemma3n_text_decoder.py
- Added Gemma3nTextDecoderBlock for implementing a single Gemma3n decoder block.
keras_hub/src/models/gemma3n/gemma3n_text_layers.py
- Added Gemma3nTextScaledWordEmbedding for scaled word embeddings.
- Added Gemma3nTextMLP for Gemma3n-specific feed-forward networks with activation sparsity.
- Added Gemma3nTextLaurelBlock for low-rank residual blocks.
- Added Gemma3nTextAltUp for the Alternating Update (AltUp) mechanism.
keras_hub/src/models/gemma3n/gemma3n_text_model.py
- Added Gemma3nTextModel as the core Gemma3n text model layer.
keras_hub/src/models/gemma3n/gemma3n_tokenizer.py
- Added Gemma3nTokenizer class for tokenizing raw strings into integer sequences with special token handling.
keras_hub/src/models/gemma3n/gemma3n_tokenizer_test.py
- Added tests for Gemma3nTokenizer, covering basic tokenization and error handling for missing special tokens.
keras_hub/src/models/gemma3n/rms_normalization.py
- Added Gemma3nRMSNorm class for Gemma 3n specific RMS normalization.
keras_hub/src/tests/mocks/mock_gemma3n_tokenizer.py
- Added MockGemma3nTokenizer for testing purposes, including special token definitions.
tools/checkpoint_conversion/convert_gemma3n_checkpoints.py
- Added a script to convert Hugging Face Gemma3n model checkpoints to Keras format, including validation.
tools/sentencepiece_testing/create_gemma3n_test_proto.py
- Added a utility script to create a SentencePiece proto file specifically for Gemma3n testing, including its special tokens.

Activity

This pull request supersedes Add Gemma 3n to KerasHub #2404, incorporating all valid changes and necessary fixes from that previous work. Once this PR is merged, Add Gemma 3n to KerasHub #2404 can be closed.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive implementation of the Gemma 3n model, including its multimodal capabilities for text, image, and audio. While no specific security vulnerabilities were identified, there are critical issues with backend-agnosticism in the preprocessing layers. Specifically, Gemma3nAudioConverter and Gemma3nCausalLMPreprocessor rely on TensorFlow-specific operations (tf.signal, tf.py_function, tf.strings, tf.RaggedTensor) rather than keras.ops, which is required for compatibility with JAX and PyTorch backends according to the KerasHub style guide. These layers must be refactored to use keras.ops to ensure backend-agnosticism. Addressing these issues will make this an excellent contribution.

keras_hub/src/models/gemma3n/gemma3n_audio_converter.py

keras_hub/src/models/gemma3n/gemma3n_causal_lm_preprocessor.py

… count

…he mobilenetv5 presets

sachinprasadhs

Thnak you !
I have reviewed few files and made comments, please check.

sachinprasadhs · 2026-03-03T21:39:44Z

keras_hub/src/models/gemma3n/gemma3n_attention.py

+        conf_num_attention_heads,
+        conf_attention_context_left,
+        conf_attention_context_right,


conf_num_attention_heads --> num_attention_heads
conf_attention_context_left --> num_attention_context_left
conf_attention_context_right --> num_attention_context_right

sachinprasadhs · 2026-03-04T01:22:23Z

keras_hub/src/models/gemma3n/gemma3n_attention.py

+
+    Args:
+        hidden_size: int. The size of the hidden state.
+        conf_num_attention_heads: int. The number of attention heads.


conf_num_attention_heads --> num_attention_heads

Same changes in arg and in description

sachinprasadhs · 2026-03-05T21:40:27Z

keras_hub/src/models/gemma3n/gemma3n_attention.py

+        conf_num_attention_heads: int. The number of attention heads.
+        conf_attention_chunk_size: int. The size of each processing chunk.
+        conf_attention_context_right: int. The number of steps to attend to in
+            the future.
+        conf_attention_context_left: int. The number of steps to attend to in
+            the past, including the current step.
+        conf_attention_logit_cap: float. The soft cap value to apply to the


conf_num_attention_heads --> num_attention_heads
conf_attention_chunk_size --> num_attention_context_right
conf_attention_context_left --> num_attention_context_left
conf_attention_logit_cap --> attention_logit_cap

sachinprasadhs · 2026-03-05T21:48:04Z

keras_hub/src/models/gemma3n/gemma3n_attention.py

+import keras
+import numpy as np


Add from keras import ops and then use the ops as ops.xxx instead of keras.ops.xxx everywhere.

sachinprasadhs · 2026-03-05T21:48:53Z

keras_hub/src/models/gemma3n/gemma3n_attention.py

+            np.arange(num_timescales, dtype="float32")
+            * -log_timescale_increment
+        )
+        self.inv_timescales = keras.ops.expand_dims(


After importing ops, use ops.expand_dims, follow this for all the ops.

sachinprasadhs · 2026-03-05T23:29:51Z

keras_hub/src/models/gemma3n/gemma3n_audio_converter.py

+        self._allow_non_tensor_positional_args = True
+        self.built = True
+
+    def _create_fb_matrix(


avoid abbreviation, name it something like _create_filterbank_matrix

sachinprasadhs · 2026-03-05T23:30:53Z

keras_hub/src/models/gemma3n/gemma3n_audio_converter.py

+            return batch_outputs_features, None
+        return batch_outputs_features, batch_outputs_masks
+
+    def call(


Add call argument details.

sachinprasadhs · 2026-03-05T23:31:58Z

keras_hub/src/models/gemma3n/gemma3n_audio_converter.py

+
+@keras_hub_export("keras_hub.layers.Gemma3nAudioConverter")
+class Gemma3nAudioConverter(keras.layers.Layer):
+    """Converts raw audio waveforms into log-mel spectrograms.


Add example usage section.

sachinprasadhs · 2026-03-05T23:34:00Z

keras_hub/src/models/gemma3n/gemma3n_audio_converter.py

+                "mel_floor": self.mel_floor_arg,
+                "per_bin_mean": self.per_bin_mean_arg,
+                "per_bin_stddev": self.per_bin_stddev_arg,


Keep the names constant and avoid suffix like _arg

sachinprasadhs · 2026-03-05T23:39:48Z