Support UpdatePositionIds in webgpu's interface by qjia7 · Pull Request #1952 · microsoft/onnxruntime-genai

qjia7 · 2026-01-22T08:20:57Z

No description provided.

Copilot

Pull request overview

Adds WebGPU support for updating position_ids via the DeviceInterface::UpdatePositionIds hook, enabling device-side updates during continuous decoding without always falling back to the CPU path.

Changes:

Implement InterfaceImpl::UpdatePositionIds for the WebGPU device interface.
Generate position_ids on CPU and upload to WebGPU device memory via ORT CopyTensors.
Return false for non-continuous decoding (batch_beam_size != 1) to trigger the existing CPU fallback.

Copilot · 2026-01-27T09:03:51Z

src/webgpu/interface.cpp

+    // Get WebGPU allocator's memory info
+    const OrtMemoryInfo* webgpu_mem_info = nullptr;
+    Ort::ThrowOnError(Ort::api->AllocatorGetInfo(ort_allocator_, &webgpu_mem_info));
+
+    // Create CPU memory info
+    auto cpu_mem_info = OrtMemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeDefault);


UpdatePositionIds is called once per decode step (often with new_kv_length == 1), but this implementation allocates a new std::vector and fetches allocator info on every call. This adds avoidable CPU overhead in the hot loop. Consider special-casing new_kv_length == 1 to use a small stack buffer (or std::array<... ,1>) and caching the OrtMemoryInfo* for the WebGPU allocator after InitOrt() (if its lifetime is stable), so the per-token path avoids heap allocation and repeated AllocatorGetInfo calls.

Copilot · 2026-01-27T09:03:51Z

src/webgpu/interface.cpp

+    if (type == ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32) {
+      // Generate int32 position_ids on CPU
+      std::vector<int32_t> cpu_data(new_kv_length);
+      for (int i = 0; i < new_kv_length; i++) {
+        cpu_data[i] = static_cast<int32_t>(start + i);
+      }
+
+      // Create source tensor (CPU memory)
+      auto src_tensor = OrtValue::CreateTensor(*cpu_mem_info, cpu_data.data(), new_kv_length * sizeof(int32_t), shape, type);
+
+      // Create destination tensor (WebGPU device memory)
+      auto dst_tensor = OrtValue::CreateTensor(*webgpu_mem_info, position_ids, new_kv_length * sizeof(int32_t), shape, type);
+
+      // Copy from CPU to GPU using CopyTensors
+      OrtValue* src_ptrs[] = {src_tensor.get()};
+      OrtValue* dst_ptrs[] = {dst_tensor.get()};
+      Ort::ThrowOnError(Ort::api->CopyTensors(&GetOrtEnv(), src_ptrs, dst_ptrs, nullptr, 1));
+    } else {
+      // Generate int64 position_ids on CPU
+      std::vector<int64_t> cpu_data(new_kv_length);
+      for (int i = 0; i < new_kv_length; i++) {
+        cpu_data[i] = static_cast<int64_t>(start + i);
+      }
+
+      // Create source tensor (CPU memory)
+      auto src_tensor = OrtValue::CreateTensor(*cpu_mem_info, cpu_data.data(), new_kv_length * sizeof(int64_t), shape, type);
+
+      // Create destination tensor (WebGPU device memory)
+      auto dst_tensor = OrtValue::CreateTensor(*webgpu_mem_info, position_ids, new_kv_length * sizeof(int64_t), shape, type);
+
+      // Copy from CPU to GPU using CopyTensors
+      OrtValue* src_ptrs[] = {src_tensor.get()};
+      OrtValue* dst_ptrs[] = {dst_tensor.get()};
+      Ort::ThrowOnError(Ort::api->CopyTensors(&GetOrtEnv(), src_ptrs, dst_ptrs, nullptr, 1));
+    }


The int32/int64 branches here duplicate the same flow (fill CPU buffer -> wrap src/dst tensors -> CopyTensors) and differ only by element type/size. Refactoring to a small templated helper (or using Ort::SizeOf(type) plus a typed fill) would reduce duplication and make future changes less error-prone.

Support UpdatePositionIds in gpu

b5fdde6

qjia7 force-pushed the position_ids branch from 8474f5f to b5fdde6 Compare January 27, 2026 07:44

qjia7 mentioned this pull request Jan 27, 2026

Add RAII wrappers for ORT Model Editor API types #1953

Merged

qjia7 marked this pull request as ready for review January 27, 2026 08:59

qjia7 requested a review from Copilot January 27, 2026 08:59

Copilot started reviewing on behalf of qjia7 January 27, 2026 08:59 View session

Copilot AI reviewed Jan 27, 2026

View reviewed changes

qjia7 marked this pull request as draft January 27, 2026 09:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support UpdatePositionIds in webgpu's interface#1952

Support UpdatePositionIds in webgpu's interface#1952
qjia7 wants to merge 1 commit intomainfrom
position_ids

qjia7 commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Copilot AI Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

qjia7 commented Jan 22, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants