Skip to content

Commit dc7a785

Browse files
Qwen 3.5 MoE Metal: Use max-sized prefill example for dynamic inputs
With alloc_graph_input=False, ExecuTorch sets the input tensor's numel_bound_ from the serialized example size. A small example (T=2) prevents runtime inputs larger than 2 tokens. Use max_seq_len-1 as the prefill example size so any prompt length is accepted at runtime. Authored with Claude. ghstack-source-id: 601c7ed ghstack-comment-id: 4263712315 Pull-Request: #18956
1 parent 780ed26 commit dc7a785

1 file changed

Lines changed: 7 additions & 3 deletions

File tree

examples/models/qwen3_5_moe/export.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -661,10 +661,14 @@ def _export_metal(model, config, args):
661661
print("Decode export successful!")
662662

663663
# --- Prefill method (T>=2, dynamic shape) ---
664+
# Use max-sized example so the serialized numel_bound_ is large enough
665+
# for any runtime input (Metal/AOTI pattern: alloc_graph_input=False
666+
# means numel_bound_ comes from the export example size).
664667
print("Exporting prefill method...")
665-
prefill_tokens = torch.tensor([[0, 1]], dtype=torch.long)
666-
prefill_pos = torch.tensor([0, 1], dtype=torch.long)
667-
seq_dim = Dim("seq_len", min=2, max=config.max_seq_len - 1)
668+
max_prefill = config.max_seq_len - 1
669+
prefill_tokens = torch.zeros((1, max_prefill), dtype=torch.long)
670+
prefill_pos = torch.arange(max_prefill, dtype=torch.long)
671+
seq_dim = Dim("seq_len", min=2, max=max_prefill)
668672
prefill_dynamic_shapes = ({1: seq_dim}, {0: seq_dim})
669673
with torch.no_grad():
670674
prefill_ep = export(

0 commit comments

Comments
 (0)