Add mixed precision support for LoRA expand & shrink kernels by chaojun-zhang · Pull Request #230 · vllm-project/vllm-xpu-kernels

chaojun-zhang · 2026-03-27T07:54:11Z

… and inputs

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS ABOVE HAVE BEEN CONSIDERED.

Purpose

Add mixed precision support for LoRA expand & shrink kernels (used in vllm ) with float32 inputs and float16 weights.

Test Plan

pytest -s -v tests/test_lora_ops::test_kernels_mixed_precision
vllm side: pytest -s -v tests/lora/test_layers.py

Test Result

(Optional) Documentation Update

BEFORE SUBMITTING, PLEASE READ https://docs.vllm.ai/en/latest/contributing (anything written below this line will be removed by GitHub Actions)

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>

Copilot

Pull request overview

Adds mixed-precision support to the XPU LoRA BGMV kernels so they can accept float32 inputs with fp16/bf16 LoRA weights, and introduces a new test variant to exercise this path.

Changes:

Extend bgmv_shrink / bgmv_expand_slice kernels to dispatch input and weight dtypes independently (allowing float32 inputs with fp16/bf16 weights).
Update test data generation utilities to allow separate input_dtype vs weight_dtype.
Add a new test_kernels_mixed_precision test matrix and mini-profiler params.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File	Description
`tests/test_lora_ops.py`	Adds mixed-precision test coverage and threads `input_dtype` through kernel checks.
`tests/lora/utils.py`	Allows generating inputs with a different dtype than the LoRA weights.
`csrc/xpu/lora/lora_shrink.cpp`	Enables mixed dtype dispatch for shrink (independent input/weight types) and updates validation.
`csrc/xpu/lora/lora_expand.cpp`	Enables mixed dtype dispatch for expand-slice and updates validation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-27T10:11:53Z

tests/test_lora_ops.py

+    """
+    Tests LoRA kernels with mixed precision:
+    input=float32, weight=float16/bfloat16.
+    """


This PR’s description template is still unfilled (missing Purpose and Test Result details). Please update the PR description so reviewers can understand intent and validate changes more quickly.

Copilot · 2026-03-27T10:11:53Z

csrc/xpu/lora/lora_shrink.cpp

+  // Use the minimum vec_size so both types can be vectorized
+  uint32_t vec_size = std::min(input_vec_size, weight_vec_size);


This file now uses std::min to compute vec_size, but it does not include <algorithm>. Please add the direct header include to avoid relying on transitive includes (which can break builds under different toolchains).

Copilot · 2026-03-27T10:11:54Z

csrc/xpu/lora/lora_shrink.cpp

+          using weight_t = std::remove_const_t<
+              std::remove_pointer_t<decltype(weight_ptr)>>;
+          VLLM_DISPATCH_FLOATING_TYPES(


The new dispatch logic uses std::remove_const_t / std::remove_pointer_t but this file does not include <type_traits>. Add the explicit include to avoid build fragility from transitive includes.

Copilot · 2026-03-27T10:11:54Z

csrc/xpu/lora/lora_expand.cpp

+  // Use the minimum vec_size so both types can be vectorized
+  uint32_t vec_size = std::min(input_vec_size, weight_vec_size);


This file now uses std::min to compute vec_size, but it does not include <algorithm>. Please add the direct header include to avoid relying on transitive includes (which can break builds under different toolchains).

Copilot · 2026-03-27T10:11:54Z

csrc/xpu/lora/lora_expand.cpp

+        auto dispatch_output = [&](auto* weight_ptr) {
+          using weight_t = std::remove_const_t<
+              std::remove_pointer_t<decltype(weight_ptr)>>;
+          switch (outputs.scalar_type()) {


The new dispatch logic uses std::remove_const_t / std::remove_pointer_t but this file does not include <type_traits>. Add the explicit include to avoid build fragility from transitive includes.

Copilot · 2026-03-27T10:11:54Z

tests/test_lora_ops.py

+    """
+    Tests LoRA kernels with mixed precision:
+    input=float32, weight=float16/bfloat16.
+    """


The new mixed-precision test sets input_dtype=torch.float32, but the Torch reference path (tests/lora/torch_ops.py) casts inputs to output_tensor.dtype (fp16/bf16) before computing. That means this test may not actually validate the float32-input execution path (it may just compare against a fp16/bf16 reference). Consider updating the reference calculation (or adding a special-case in this test) so the reference keeps float32 inputs and only casts the final result to the output dtype as needed.

Support mixed precision for LoRA kernels

e671c40

Signed-off-by: chaojun-zhang <chaojun.zhang@intel.com>

Copilot AI review requested due to automatic review settings March 27, 2026 07:54

chaojun-zhang changed the title ~~Support mixed precision for LoRA kernels~~ Add mixed precision support for LoRA expand & shrink kernels Mar 27, 2026

Copilot started reviewing on behalf of chaojun-zhang March 27, 2026 08:45 View session

Copilot AI reviewed Mar 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mixed precision support for LoRA expand & shrink kernels#230

Add mixed precision support for LoRA expand & shrink kernels#230
chaojun-zhang wants to merge 1 commit intovllm-project:mainfrom
chaojun-zhang:lora_ops_opt

chaojun-zhang commented Mar 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Copilot AI Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// Use the minimum vec_size so both types can be vectorized
		uint32_t vec_size = std::min(input_vec_size, weight_vec_size);

Conversation

chaojun-zhang commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chaojun-zhang commented Mar 27, 2026 •

edited

Loading