-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Description
We would like to support Qwen3.5 model from functionality perspective firstly and then further optimize kernel performance to improve E2E performance.
Functionality
- add XPU gdn op support to vLLM [XPU] Support Qwen3-next/Qwen3.5 vllm#33657
- this should be updated and merged after refactor PR [Model] Extract GatedDeltaNetAttention into shared layer for Qwen3Next and Qwen3.5 vllm#37975
- support fp32 ssm_state in chunk_fwd_o kernel Support f32 ssm_state in GDN kernel for Qwen3.5 #220
Performance optimizations
GDN attention
Base kernel version: #156
- optimize l2norm kernel Optimize l2norm in GDN kernel for Qwen3.5 #222
- optimize chunk_fwd_o kernel
- optimize grouped gemm kernel
Layer Norm
- add sycl kernel for GemmaRMSNorm and RMSNormGated support gemma_rms_norm and rms_norm_gated kernels #214
Qwen3 VisionTransformer
(placeholder)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels