Skip to content

Vectorize scatter operation in NumPy backend#22218

Open
0xRozier wants to merge 3 commits intokeras-team:masterfrom
0xRozier:fix/issue-22208-vectorize-scatter-numpy-backend
Open

Vectorize scatter operation in NumPy backend#22218
0xRozier wants to merge 3 commits intokeras-team:masterfrom
0xRozier:fix/issue-22208-vectorize-scatter-numpy-backend

Conversation

@0xRozier
Copy link

@0xRozier 0xRozier commented Feb 19, 2026

Summary

  • Replace the Python for loop in scatter() with NumPy's vectorized np.add.at, yielding ~87x speedup for large-scale scatter operations (e.g. 10^6 updates)
  • The change is minimal (3 lines removed, 2 added) and follows the same pattern already used by scatter_update() in the same file

Details

The current implementation iterates through each index with a Python loop:

for i in range(indices.shape[0]):
    index = indices[i]
    zeros[tuple(index)] += values[i]

This bypasses NumPy's internal C-optimized loops. The fix replaces it with:

idx = tuple(indices.T)
np.add.at(zeros, idx, values)

np.add.at correctly handles duplicate indices via cumulative addition, maintaining full compatibility with existing behavior.

Benchmark (1M updates on a 1000x1000 array):

  • Before: ~3.86s
  • After: ~0.04s

Fixes #22208

Test plan

  • All 6 existing scatter-related tests pass (CoreOpsDynamicShapeTest, CoreOpsStaticShapeTest, CoreOpsCorrectnessTest)
  • Verified duplicate index accumulation works correctly
  • Verified 1D, 2D, 3D, and slice scatter cases

@google-cla
Copy link

google-cla bot commented Feb 19, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @0xRozier, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance of the scatter operation within the NumPy backend by replacing a Python for loop with NumPy's vectorized np.add.at function. This change dramatically improves execution speed for large-scale data manipulations, ensuring more efficient tensor operations without altering existing behavior or compatibility.

Highlights

  • Performance Improvement: Replaced the Python for loop in the scatter() function with NumPy's vectorized np.add.at, yielding an ~87x speedup for large-scale scatter operations (e.g., 10^6 updates).
  • Code Simplification: Achieved significant performance gains with minimal code changes (3 lines removed, 2 added), aligning with the pattern already used by scatter_update() in the same file.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • keras/src/backend/numpy/core.py
    • Replaced the iterative for loop implementation of scatter with a vectorized np.add.at call.
Activity
  • Addressed issue [Performance] Vectorize scatter operation in NumPy backend #22208.
  • Verified all 6 existing scatter-related tests passed, including CoreOpsDynamicShapeTest, CoreOpsStaticShapeTest, and CoreOpsCorrectnessTest.
  • Confirmed correct handling of duplicate index accumulation.
  • Validated 1D, 2D, 3D, and slice scatter cases.
  • The pull request was generated with Claude Code.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent change that significantly improves the performance of the scatter operation in the NumPy backend. By replacing the Python for loop with the vectorized np.add.at function, the implementation becomes much more efficient, idiomatic, and concise. The change correctly maintains the behavior of accumulating values for duplicate indices and aligns well with similar patterns found in scatter_update within the same file. The performance gains described are substantial, making this a valuable optimization.

@codecov-commenter
Copy link

codecov-commenter commented Feb 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.90%. Comparing base (1a0655b) to head (12a6f80).
⚠️ Report is 75 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #22218      +/-   ##
==========================================
+ Coverage   82.89%   82.90%   +0.01%     
==========================================
  Files         593      594       +1     
  Lines       64169    65843    +1674     
  Branches    10073    10292     +219     
==========================================
+ Hits        53192    54589    +1397     
- Misses       8385     8638     +253     
- Partials     2592     2616      +24     
Flag Coverage Δ
keras 82.73% <100.00%> (+0.01%) ⬆️
keras-jax 60.94% <0.00%> (-1.10%) ⬇️
keras-numpy 55.11% <100.00%> (-1.07%) ⬇️
keras-openvino 49.09% <0.00%> (+11.22%) ⬆️
keras-tensorflow 62.16% <0.00%> (-1.14%) ⬇️
keras-torch 61.01% <0.00%> (-1.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Replace the Python for-loop in `scatter()` with NumPy's `np.add.at`
for vectorized index accumulation. This yields ~87x speedup for
large-scale scatter operations (e.g. 10^6 updates).

Fixes keras-team#22208
Sort the axis list in `RMSNormalization.build()` and in
`_rms_normalization()` so that unsorted axes like `[-1, -2]` produce
the same `normalized_shape` and scale shape as `[-2, -1]`.

Adds a test covering unsorted contiguous axes.
@0xRozier 0xRozier force-pushed the fix/issue-22208-vectorize-scatter-numpy-backend branch from ab20f3a to 89021f4 Compare February 19, 2026 17:45
@0xRozier
Copy link
Author

@hertschuh, if you can take a quick look, it would be great (take your time tho, I'm not in a rush)

@hertschuh
Copy link
Collaborator

There seems to be an unrelated fix with RMS normalization, should that be a separate PR?

@0xRozier
Copy link
Author

0xRozier commented Mar 2, 2026

You're right, the RMS normalization fix is unrelated. I can remove it from this PR and submit it as a separate one — let me know if you'd prefer that.

For context: it addresses a minor bug where passing unsorted axes (e.g. axis=[-1, -2]) to RMSNormalization produces an incorrect scale shape. Happy to open a dedicated issue and PR for it.

@hertschuh
Copy link
Collaborator

You're right, the RMS normalization fix is unrelated. I can remove it from this PR and submit it as a separate one — let me know if you'd prefer that.

For context: it addresses a minor bug where passing unsorted axes (e.g. axis=[-1, -2]) to RMSNormalization produces an incorrect scale shape. Happy to open a dedicated issue and PR for it.

Yes, please separate the RMSNormalization fix and remove it from this PR.

For one thing, there are already 2 other PRs addressing the same RMSNormalization issue.

@0xRozier
Copy link
Author

0xRozier commented Mar 4, 2026

Done — I've removed the RMSNormalization fix from this PR. It now only contains the scatter vectorization change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Performance] Vectorize scatter operation in NumPy backend

4 participants