Feature Request: Router Replay (R3) for stable RL with MoE models

## Summary
Request to investigate supporting [Rollout Routing Replay (R3)](https://arxiv.org/abs/2510.11370) in Megatron Core's MoE layer. R3 addresses a training instability specific to MoE models during reinforcement learning: when separate inference and training engines independently compute routing decisions, expert selection diverges for the same inputs, causing policy mismatch that can lead to training collapse.

## Motivation
In RL pipelines that use separate inference (e.g., SGLang/vLLM) and training (e.g., Megatron) engines, routers in each engine independently select experts. Even with identical weights, numerical differences cause ~10% of routers to disagree per forward pass, with 94% of tokens differing in at least one layer. This compounding mismatch destabilizes training. In experiments on Qwen3-30B-A3B, 3 of 3 baseline GRPO runs collapsed while all R3 runs completed successfully.

R3 fixes this by caching the binary expert-selection mask from the inference engine and replaying it during training. Training logits are still computed normally (preserving gradient flow to the router), but expert selection is forced to match inference.

Results (Qwen3-30B-A3B, GRPO):
- Average score: 71.83 vs 62.23 baseline (single mini-step SFT)
- KL divergence reduced ~2× (approaching dense model levels)
- Eliminates catastrophic training collapse across all configurations tested

## Requested Feature
Enable R3-style replay when Megatron is used as the training engine in hybrid RL pipelines. Investigate adding an option in `megatron.core.transformer.moe` to accept an external expert-selection mask during the forward pass, bypassing the router's top-k selection while still computing gating weights from the training router's logits. 

## References
- [R3 Paper (Ma, Zhang, Zhao et al., Peking University / Xiaomi)](https://arxiv.org/abs/2510.11370)
- [veRL tracking issue](https://github.com/verl-project/verl/issues/3762) — community demand for routing replay support in RL frameworks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Router Replay (R3) for stable RL with MoE models #4168

Summary

Motivation

Requested Feature

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Router Replay (R3) for stable RL with MoE models #4168

Description

Summary

Motivation

Requested Feature

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions