Metal backend: Add topk fallback kernel via MPSGraph by manuelcandales · Pull Request #18876 · pytorch/executorch

manuelcandales · 2026-04-14T16:25:23Z

Adds aoti_torch_mps_topk using MPSGraph's topKWithSourceTensor. Required
for MoE expert routing (torch.topk in SparseMoE.forward). Supports
arbitrary dim via transpose-topk-transpose, largest/smallest modes,
float32 and bfloat16. Includes MPSGraph caching and int32-to-int64
indices conversion (AOTInductor expects int64, MPSGraph outputs int32).

Authored with Claude.

[ghstack-poisoned]

manuelcandales · 2026-04-14T16:25:24Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2026-04-14T16:25:27Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18876

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 156 Pending

As of commit 933122c with merge base 4898af2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

metascroy · 2026-04-16T23:48:53Z

@claude review this code

claude · 2026-04-16T23:49:31Z

Claude finished @metascroy's task in 2m 25s —— View job

Code Review: Metal backend topk fallback kernel

Overall the implementation is solid and follows the established patterns from the other Metal ops. The transpose-topk-transpose approach for arbitrary dims and the negate trick for smallest mode are correct. A few issues worth addressing:

Issues

1. Memory leak on error path (op_topk.mm:232-237)

If values_handle creation fails, the code frees both buffers but doesn't remove values_ptr from ptr_to_mtl_buffer. However, the bigger issue is earlier: if the MPSGraph execution itself throws (caught by the outer catch), both values_ptr and indices_ptr are leaked — they were allocated at lines 101-102 but the catch blocks at lines 290-296 don't free them.

Fix this →

2. sorted parameter is silently ignored (op_topk.mm:25)

The sorted parameter is accepted but never used. MPSGraph's topKWithSourceTensor returns sorted results by default, so this is likely fine in practice, but sorted=false won't actually give unsorted results. Worth either documenting this limitation or validating that sorted=true (and returning an error for sorted=false if unsorted output isn't supported).

3. Missing endKernelCoalescing() call (op_topk.mm:146,216)

The other ops (op_mm.mm:108, op_bmm.mm, op_convolution.mm) all call stream->endKernelCoalescing() before executing the MPSGraph. This op doesn't. This could cause issues if kernels were being coalesced when topk is called.

Fix this →

4. Missing cache_stats.logStats() calls (op_topk.mm:125,148)

Other ops call cache_stats.logStats() after incrementing hits/misses. This op only increments the counters without calling logStats(), so cache statistics won't be periodically logged for topk operations.

5. Missing [... release] for MPSGraphTensorData objects (op_topk.mm:132-144,201-213)

The other ops (op_mm.mm:274-276, op_bmm.mm) explicitly release MPSGraphTensorData objects after execution. This op relies on @autoreleasepool to clean them up, which works but is inconsistent with the existing pattern. The objects created with [[MPSGraphTensorData alloc] init...] are not autoreleased — they are owned and need explicit release (or the autoreleasepool handles them at scope exit, but only if there's no long-lived reference).

6. @try/@catch only wraps cache-miss path (op_topk.mm:151-222)

The @try/@catch for ObjC exceptions only wraps the cache-miss graph construction and execution. The cache-hit execution at line 146 is not protected, so if executeMPSGraph throws an NSException on a cache hit, it won't be caught by the @try block (it will propagate to the C++ catch(...) which may not handle ObjC exceptions correctly on all toolchains).

Fix this →

7. Redundant stream variable in int32→int64 conversion (op_topk.mm:251)

auto* stream_sync = getCurrentMetalStream();

This creates a new stream_sync variable, but stream (from line 37) is the same object. Minor, but unnecessary.

Nit

8. extern declaration inside function body (op_topk.mm:240)

extern std::unordered_map<void*, int32_t> memory_to_n_tensor;

This works but is unusual. The other ops don't seem to need this pattern — the memory.h header (already included transitively through common.h) should provide this declaration. If it doesn't, the extern should be at namespace scope, not inside the function.

Test Coverage

The TopK test module (test_modules.py:672-689) only tests dim=-1 (last dimension) with largest=True (default). Since the transpose logic for non-last dimensions and the negate trick for largest=False are the most complex parts of this implementation, it would be valuable to add test cases for:

A non-last dimension (e.g., dim=0)
largest=False

Build Integration

CMakeLists.txt and metal_backend.py changes look correct — op_topk.mm is added to sources and at::_ops::topk::call is registered as a supported fallback kernel.

[ghstack-poisoned]

metascroy · 2026-04-20T18:20:14Z

+      size_t element_size;
+      MPSDataType mps_dtype;
+
+      if (dtype == static_cast<int32_t>(SupportedDTypes::FLOAT32)) {


Nit: could these be factored out into general utitlies?

[ghstack-poisoned]

github-actions · 2026-04-20T19:32:36Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

manuelcandales added 3 commits April 14, 2026 12:25

Update

a3a42e4

[ghstack-poisoned]

Update

1c965c6

[ghstack-poisoned]

Update

1be53ab

[ghstack-poisoned]

manuelcandales requested review from kirklandsign, larryliu0820 and shoumikhin as code owners April 14, 2026 16:25

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 14, 2026

manuelcandales added 3 commits April 14, 2026 18:44

Update

e7a7acc

[ghstack-poisoned]

Update

5530242

[ghstack-poisoned]

Update

1fbb94f

[ghstack-poisoned]

manuelcandales requested review from mergennachin and metascroy and removed request for kirklandsign, larryliu0820 and shoumikhin April 15, 2026 15:12

manuelcandales mentioned this pull request Apr 16, 2026

Qwen 3.5 MoE Metal: Use max-sized prefill example for dynamic inputs #18956

Merged

Update

98d2f81

[ghstack-poisoned]

metascroy reviewed Apr 20, 2026

View reviewed changes

metascroy approved these changes Apr 20, 2026

View reviewed changes

manuelcandales added 2 commits April 20, 2026 15:24

Update

9000488

[ghstack-poisoned]

Update

a060d19

[ghstack-poisoned]

Base automatically changed from gh/manuelcandales/170/head to main April 20, 2026 19:30

Update

933122c

[ghstack-poisoned]

manuelcandales merged commit 66e4656 into main Apr 20, 2026
174 of 178 checks passed

manuelcandales deleted the gh/manuelcandales/171/head branch April 20, 2026 19:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metal backend: Add topk fallback kernel via MPSGraph#18876

Metal backend: Add topk fallback kernel via MPSGraph#18876
manuelcandales merged 10 commits intomainfrom
gh/manuelcandales/171/head

manuelcandales commented Apr 14, 2026

Uh oh!

manuelcandales commented Apr 14, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

metascroy commented Apr 16, 2026

Uh oh!

claude Bot commented Apr 16, 2026 •

edited

Loading

Uh oh!

metascroy Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

manuelcandales commented Apr 14, 2026

Uh oh!

manuelcandales commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18876

⏳ No Failures, 156 Pending

Uh oh!

metascroy commented Apr 16, 2026

Uh oh!

claude Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: Metal backend topk fallback kernel

Issues

Nit

Test Coverage

Build Integration

Uh oh!

metascroy Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 20, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

manuelcandales commented Apr 14, 2026 •

edited

Loading

pytorch-bot Bot commented Apr 14, 2026 •

edited

Loading

claude Bot commented Apr 16, 2026 •

edited

Loading

This PR needs a `release notes:` label