Support for Qwen 3.5 Model Series (Dense and Quantized) by woodRock · Pull Request #3396 · huggingface/candle

woodRock · 2026-03-06T17:44:25Z

This PR implements support for the Qwen 3.5 model series, covering the dense variants (0.8B, 2B, 4B, 9B, and 27B) and their corresponding GGUF quantized versions.

Qwen 3.5 introduces a sophisticated hybrid architecture that alternates between Gated Attention (standard softmax-based attention) and Gated DeltaNet (a linear attention/SSM variant).

Key Changes:

Core Implementation: Added candle-transformers/src/models/qwen3_5.rs to handle the hybrid
architecture.
Quantization Support: Added candle-transformers/src/models/quantized_qwen3_5.rs with full GGUF
support, including proper mapping for SSM-specific tensors (ssm_in, ssm_conv1d, ssm_alpha,
ssm_beta, etc.).
Metadata Handling: Updated configuration logic to handle the nested text_config and
rope_parameters structure used in Qwen 3.5.
Examples:
- Updated the existing qwen example to support the new 3.5 sizes.
- Added a new quantized-qwen3_5 example for running GGUF models.

Performance & Optimization:
The Gated DeltaNet layers involve a recurrent state update that is inherently sequential ($S_t = S_{t-1} \times g_t + \dots$).

Optimized Tensor Path: To maximize speed without custom kernels, I refactored the sequential loop
to use Matrix Multiplications (GEMM) instead of broadcasting/sums and hoisted all loop-invariants
(dtype conversions, exponentials). This significantly reduces kernel launch overhead on Metal and
CUDA.
Future Work (Fused Kernels): While the current implementation is highly optimized for pure tensor
operations, the prefill (prompt processing) phase is still $O(N)$. To achieve production-grade
prefill speeds, this architecture would benefit from a custom Fused Parallel Scan kernel (similar
to the Selective Scan in Mamba) to parallelize the recurrence across the sequence dimension.

Testing:
Verified with Qwen/Qwen3.5-0.8B (BF16) and unsloth/Qwen3.5-0.8B-GGUF (Q4_K_M) on Metal.

Standard Dense:

cargo run --features metal --example qwen --release -- --model 3.5-0.8b --prompt "Tell me a short story."

Quantized GGUF:

  cargo run --features metal --example quantized-qwen3_5 --release -- --which 0.8b --prompt "Explain the concept of quantum entanglement."

Closes #3393

- Implement Qwen 3.5 hybrid architecture (Gated DeltaNet + Gated Attention) - Optimize linear attention prefill using GEMM kernels - Add GGUF support for quantized Qwen 3.5 variants - Update examples to support the new model series

rupurt · 2026-03-11T14:30:17Z

Would love to have support for this. The jina ranking models also need it. Thanks for adding support @woodRock

lucasjinreal · 2026-03-13T05:49:43Z

does the model able to review now? I really want add qwen3.5 as local alternate to openclaw

rupurt · 2026-03-13T13:42:08Z

I was able to integrate this branch successfully in my project. Works great

lucasjinreal · 2026-03-14T06:58:08Z

@rupurt hi, how's the speed

rupurt · 2026-03-14T13:19:21Z

Slow as a dawg on CPU, but pretty decent on apple hardware. Takes about 1 second per reranking result for me in sift on CPU

AlpineVibrations · 2026-03-14T20:53:02Z

so cool. thanks for doing this. is there quantized support for the 35B?

minybot · 2026-03-20T23:24:56Z

Hello, I run your examples on Dense and Quantized ones. It worked. However, when I try different prompts the repeating occurred. I tried Dense 0.8B, 2B, 4B, 9B, 27B with different temperature, top_k and repeat_penalty. The repeating issue happened. The different is when the model is bigger repeating occurred later. Do you have any idea what is the issue? Thanks.

woodRock mentioned this pull request Mar 6, 2026

Qwen3.5 Support #3393

Open

lastsunday mentioned this pull request Mar 7, 2026

Qwen3.5 lastsunday/chobits#29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Qwen 3.5 Model Series (Dense and Quantized)#3396

Support for Qwen 3.5 Model Series (Dense and Quantized)#3396
woodRock wants to merge 1 commit intohuggingface:mainfrom
woodRock:feat-qwen3.5-support

woodRock commented Mar 6, 2026

Uh oh!

rupurt commented Mar 11, 2026

Uh oh!

lucasjinreal commented Mar 13, 2026

Uh oh!

rupurt commented Mar 13, 2026

Uh oh!

lucasjinreal commented Mar 14, 2026

Uh oh!

rupurt commented Mar 14, 2026

Uh oh!

AlpineVibrations commented Mar 14, 2026

Uh oh!

minybot commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

woodRock commented Mar 6, 2026

Uh oh!

rupurt commented Mar 11, 2026

Uh oh!

lucasjinreal commented Mar 13, 2026

Uh oh!

rupurt commented Mar 13, 2026

Uh oh!

lucasjinreal commented Mar 14, 2026

Uh oh!

rupurt commented Mar 14, 2026

Uh oh!

AlpineVibrations commented Mar 14, 2026

Uh oh!

minybot commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants