[Feature] Verify and Support for DeepSeek-OCR-2 by liutianyang-2026 · Pull Request #7665 · vllm-project/vllm-ascend

liutianyang-2026 · 2026-03-26T03:23:36Z

What this PR does / why we need it?

This PR adds and validates DeepSeek-OCR-2 support in vllm-ascend, and includes compatibility/stability fixes found during bring-up.

Main changes:

Initialize and verify DeepSeek-OCR-2 support flow on Ascend.
Fix graph compilation failure in QKNormRopeFusionPass when rope_dim <= 0.
- Root cause: DeepSeek-OCR-2 sets max_position_embeddings=0, which can lead to rope_dim=0.
Add a compatibility patch for newer transformers, and fix lm-eval failures on DeepSeek models.
- Context: DeepSeek-OCR-2 depends on LlamaFlashAttention2 (though as a fallback), which is deprecated in later transformers versions.
Add DeepSeek-OCR-2 tutorial/documentation.
Add e2e model config for correctness validation:
- tests/e2e/models/configs/DeepSeek-OCR-2.yaml
Apply CI/lint fixes for all related changes.

Why needed:

Enable reliable DeepSeek-OCR-2 usage on Ascend.
Avoid graph compile errors and lm-eval compatibility issues on relevant DeepSeek code paths.

Does this PR introduce any user-facing change?

Yes.

Adds DeepSeek-OCR-2 support and usage documentation.
Improves runtime compatibility/stability for affected DeepSeek model scenarios (newer transformers, lm-eval path, graph compilation edge case).

How was this patch tested?

Added e2e model config:
- tests/e2e/models/configs/DeepSeek-OCR-2.yaml
Ran CI/lint alignment in follow-up commit (apply ci & lint).
vLLM version: v0.18.0
vLLM main: vllm-project/vllm@35141a7

Signed-off-by: Tianyang Liu <liutianyang@isrc.iscas.ac.cn>

… deepseek models Signed-off-by: Tianyang Liu <liutianyang@isrc.iscas.ac.cn>

Signed-off-by: Tianyang Liu <liutianyang@isrc.iscas.ac.cn>

gemini-code-assist · 2026-03-26T03:24:02Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates DeepSeek-OCR-2 into the vllm-ascend framework, enabling its use for advanced optical character recognition and document understanding on Ascend hardware. It addresses critical stability and compatibility issues encountered during the model's bring-up, such as graph compilation errors due to specific model configurations and conflicts with newer transformers library versions. The changes ensure reliable deployment and operation of DeepSeek-OCR-2, accompanied by detailed documentation and testing configurations.

Highlights

DeepSeek-OCR-2 Support: Introduced and validated support for the DeepSeek-OCR-2 vision-language model within the vllm-ascend framework.
Graph Compilation Fix: Resolved a graph compilation failure in QKNormRopeFusionPass by adding a check for non-positive rope_dim, which can occur with models like DeepSeek-OCR-2.
Transformers Compatibility: Implemented a compatibility patch for newer transformers library versions to address the deprecation of LlamaFlashAttention2, preventing import errors for DeepSeek models.
Documentation and Examples: Added comprehensive tutorial and documentation for DeepSeek-OCR-2, including environment setup, deployment, quick test scripts, API server instructions, and evaluation guidelines.
End-to-End Configuration: Included an end-to-end model configuration file for DeepSeek-OCR-2 to facilitate correctness validation and testing.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

github-actions · 2026-03-26T03:24:35Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request adds support for the DeepSeek-OCR-2 model, including a new tutorial document, an end-to-end test configuration, and a critical patch to resolve LlamaFlashAttention2 import errors in newer transformers versions. A robustness check for rope_dim was also added to the QKNorm and Rope fusion pass. Feedback includes correcting the trust_remote_code parameter to True in the tutorial's Python example and fixing a mislabeled code block from python to bash for proper syntax highlighting.

docs/source/tutorials/models/DeepSeek-OCR-2.md

github-actions · 2026-03-26T03:27:23Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

liutianyang-2026 · 2026-03-26T03:29:47Z

Local Test Results

e2e pytest:
log.log
server test:
performance test:
CI/Lint test:

liutianyang-2026 · 2026-03-26T03:31:30Z

SKILL: Adapt DeepSeek-OCR-2 (VLM) for vllm-ascend

Skill Overview

Skill Name: Adapt DeepSeek-OCR-2 (VLM) for vllm-ascend
Core Objective: Complete the end-to-end verification, dependency patching, compilation pass optimization, and CI evaluation integration for a multi-modal Large Language Model featuring complex trust_remote_code on Ascend NPU.
Related Domains: Multiprocessing Context Isolation, Framework-level Monkey Patching, VLM Baseline Evaluation, Graph Compilation Passes.

Workflows & Best Practices

1. Process Isolation for Remote Code Loading

The Pitfall: When trust_remote_code=True is enabled, loading custom HuggingFace code in the main process prematurely triggers PyTorch's C++ initialization. When vLLM subsequently spawns child processes (EngineCore) using the default fork method, the children inherit a polluted OpenMP thread pool state on ARM/Ascend architectures, leading to random Invalid thread pool crashes.
Best Practice: You must enforce a spawn multiprocessing method and strict OpenMP thread limits via environment variables before any heavy libraries (like vllm or torch) are imported.
```
export VLLM_WORKER_MULTIPROC_METHOD=spawn
export OMP_WAIT_POLICY=PASSIVE
export OMP_NUM_THREADS=1
```

2. Upstream Dependency Patching via Platform Hooks

The Pitfall: Legacy remote code (e.g., DeepSeek's modeling_deepseekv2.py) may import classes that have been removed in newer versions of the transformers library (like LlamaFlashAttention2), causing an ImportError during AutoConfig loading.
Best Practice: Never modify the user's HuggingFace cache or the model's source code. Instead, utilize vllm-ascend's dynamic patch mechanism. Inject a dummy alias (e.g., mapping LlamaFlashAttention2 to LlamaAttention) in vllm_ascend/patch/platform/ to bypass the inheritance check during the earliest stage of the framework lifecycle.

3. Handling Anomalous Model Configurations in Graph Compilation

The Pitfall: DeepSeek-OCR-2 sets max_position_embeddings=0 in its configuration. This anomalous value cascades into the graph compilation phase, specifically causing rope_dim <= 0, which leads to a compilation failure in the QKNormRopeFusionPass (a pass designed to optimize QK-Norm and RoPE execution on NPU).
Best Practice: When adapting new models, always inspect their specific configuration anomalies (like zeroed-out position embeddings for pure vision/OCR tasks). Update the graph compilation passes to safely skip or handle invalid dimensions (e.g., if rope_dim <= 0: return) to prevent the entire compilation pipeline from breaking.

4. Efficient Client-Server Debugging & VLM Evaluation

The Pitfall: Mixing the evaluation harness (lm_eval) and the vLLM engine in a single process requires reloading tens of gigabytes of model weights for every configuration tweak, drastically slowing down debugging. Furthermore, lm_eval's dataset downloads can be unexpectedly hijacked by VLLM_USE_MODELSCOPE.
Best Practice:
- Decoupled Architecture: Launch a persistent API Server (vllm.entrypoints.openai.api_server) in one terminal. Use lm_eval --model local-chat-completions as a lightweight client in another terminal to quickly iterate on VLM datasets like doc_vqa.
- Baseline Establishment: Since base OCR models lack Chat Templates and tend to output verbose descriptions on zero-shot prompts, employ Contextual Learning (e.g., --num_fewshot 3) to regularize the output format. Accept a non-zero, deterministic score (even if relatively low) as a valid regression baseline for hardware execution correctness.

Effective Prompts for AI Collaboration

"The model runs correctly occasionally but crashes with 'Invalid thread pool' randomly. Could you analyze the C++ stack trace to determine if there's a race condition related to the fork mechanism when loading Remote Code?"
"If modifying the HuggingFace cache is prohibited, at which lifecycle stage of vllm-ascend (Worker or Platform) should I inject a dependency fix patch to intercept AutoConfig?"

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: liutianyang-2026 <liutianyang@isrc.iscas.ac.cn>

Signed-off-by: liutianyang-2026 <liutianyang@isrc.iscas.ac.cn>

…heck Signed-off-by: Tianyang Liu <liutianyang@isrc.iscas.ac.cn>

liutianyang-2026 added 5 commits March 26, 2026 10:47

init for deepseek-ocr-2 verify/support

0b42771

Signed-off-by: Tianyang Liu <liutianyang@isrc.iscas.ac.cn>

fix graph compilation error in QKNormRopeFusionPass where rope_dim <= 0

bdd8b5d

Signed-off-by: Tianyang Liu <liutianyang@isrc.iscas.ac.cn>

add patch for later version of transformers, fix error for lm-eval on…

b84e3cc

… deepseek models Signed-off-by: Tianyang Liu <liutianyang@isrc.iscas.ac.cn>

add tutorial & test yaml

1074d47

Signed-off-by: Tianyang Liu <liutianyang@isrc.iscas.ac.cn>

apply ci & lint

ee716e4

Signed-off-by: Tianyang Liu <liutianyang@isrc.iscas.ac.cn>

liutianyang-2026 requested review from LCAIZJ, Yikun, wangxiyuan and yiz-liu as code owners March 26, 2026 03:23

liutianyang-2026 changed the title ~~[Feature] Verify and Support for DeepSeek-OCR-2 #6692~~ [Feature] Verify and Support for DeepSeek-OCR-2 Mar 26, 2026

github-actions bot added documentation Improvements or additions to documentation module:tests labels Mar 26, 2026

gemini-code-assist bot reviewed Mar 26, 2026

View reviewed changes

docs/source/tutorials/models/DeepSeek-OCR-2.md Outdated Show resolved Hide resolved

docs/source/tutorials/models/DeepSeek-OCR-2.md Outdated Show resolved Hide resolved

github-actions bot added the merge-conflicts label Mar 26, 2026

liutianyang-2026 and others added 3 commits March 26, 2026 11:33

Update docs/source/tutorials/models/DeepSeek-OCR-2.md

8ecd486

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: liutianyang-2026 <liutianyang@isrc.iscas.ac.cn>

Update docs/source/tutorials/models/DeepSeek-OCR-2.md

f56122a

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: liutianyang-2026 <liutianyang@isrc.iscas.ac.cn>

Merge branch 'main' into feat-deepseek-ocr-2

2f9fcfb

Signed-off-by: liutianyang-2026 <liutianyang@isrc.iscas.ac.cn>

liutianyang-2026 marked this pull request as draft March 26, 2026 03:38

github-actions bot removed the merge-conflicts label Mar 26, 2026

fix docs: add DeepSeek-OCR-2 to toctree and continue the pre-commit c…

4a28464

…heck Signed-off-by: Tianyang Liu <liutianyang@isrc.iscas.ac.cn>

liutianyang-2026 marked this pull request as ready for review March 26, 2026 04:44

Merge branch 'main' into feat-deepseek-ocr-2

8baac86

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Verify and Support for DeepSeek-OCR-2#7665

[Feature] Verify and Support for DeepSeek-OCR-2#7665
liutianyang-2026 wants to merge 10 commits intovllm-project:mainfrom
liutianyang-2026:feat-deepseek-ocr-2

liutianyang-2026 commented Mar 26, 2026 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot commented Mar 26, 2026

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

liutianyang-2026 commented Mar 26, 2026 •

edited

Loading

Uh oh!

liutianyang-2026 commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

liutianyang-2026 commented Mar 26, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot commented Mar 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 26, 2026

Uh oh!

liutianyang-2026 commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Local Test Results

Uh oh!

liutianyang-2026 commented Mar 26, 2026

SKILL: Adapt DeepSeek-OCR-2 (VLM) for vllm-ascend

Skill Overview

Workflows & Best Practices

1. Process Isolation for Remote Code Loading

2. Upstream Dependency Patching via Platform Hooks

3. Handling Anomalous Model Configurations in Graph Compilation

4. Efficient Client-Server Debugging & VLM Evaluation

Effective Prompts for AI Collaboration

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

liutianyang-2026 commented Mar 26, 2026 •

edited by github-actions bot

Loading

liutianyang-2026 commented Mar 26, 2026 •

edited

Loading