Commit dc7a785
committed
Qwen 3.5 MoE Metal: Use max-sized prefill example for dynamic inputs
With alloc_graph_input=False, ExecuTorch sets the input tensor's
numel_bound_ from the serialized example size. A small example (T=2)
prevents runtime inputs larger than 2 tokens. Use max_seq_len-1 as
the prefill example size so any prompt length is accepted at runtime.
Authored with Claude.
ghstack-source-id: 601c7ed
ghstack-comment-id: 4263712315
Pull-Request: #189561 parent 780ed26 commit dc7a785
1 file changed
Lines changed: 7 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
661 | 661 | | |
662 | 662 | | |
663 | 663 | | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
664 | 667 | | |
665 | | - | |
666 | | - | |
667 | | - | |
| 668 | + | |
| 669 | + | |
| 670 | + | |
| 671 | + | |
668 | 672 | | |
669 | 673 | | |
670 | 674 | | |
| |||
0 commit comments