Skip to content

[Feature] Verify and Support for DeepSeek-OCR-2#7665

Open
liutianyang-2026 wants to merge 10 commits intovllm-project:mainfrom
liutianyang-2026:feat-deepseek-ocr-2
Open

[Feature] Verify and Support for DeepSeek-OCR-2#7665
liutianyang-2026 wants to merge 10 commits intovllm-project:mainfrom
liutianyang-2026:feat-deepseek-ocr-2

Conversation

@liutianyang-2026
Copy link
Copy Markdown

@liutianyang-2026 liutianyang-2026 commented Mar 26, 2026

What this PR does / why we need it?

This PR adds and validates DeepSeek-OCR-2 support in vllm-ascend, and includes compatibility/stability fixes found during bring-up.

Main changes:

  • Initialize and verify DeepSeek-OCR-2 support flow on Ascend.
  • Fix graph compilation failure in QKNormRopeFusionPass when rope_dim <= 0.
    • Root cause: DeepSeek-OCR-2 sets max_position_embeddings=0, which can lead to rope_dim=0.
  • Add a compatibility patch for newer transformers, and fix lm-eval failures on DeepSeek models.
    • Context: DeepSeek-OCR-2 depends on LlamaFlashAttention2 (though as a fallback), which is deprecated in later transformers versions.
  • Add DeepSeek-OCR-2 tutorial/documentation.
  • Add e2e model config for correctness validation:
    • tests/e2e/models/configs/DeepSeek-OCR-2.yaml
  • Apply CI/lint fixes for all related changes.

Why needed:

  • Enable reliable DeepSeek-OCR-2 usage on Ascend.
  • Avoid graph compile errors and lm-eval compatibility issues on relevant DeepSeek code paths.

Does this PR introduce any user-facing change?

Yes.

  • Adds DeepSeek-OCR-2 support and usage documentation.
  • Improves runtime compatibility/stability for affected DeepSeek model scenarios (newer transformers, lm-eval path, graph compilation edge case).

How was this patch tested?

  • Added e2e model config:

    • tests/e2e/models/configs/DeepSeek-OCR-2.yaml
  • Ran CI/lint alignment in follow-up commit (apply ci & lint).

  • vLLM version: v0.18.0

  • vLLM main: vllm-project/vllm@35141a7

Signed-off-by: Tianyang Liu <liutianyang@isrc.iscas.ac.cn>
Signed-off-by: Tianyang Liu <liutianyang@isrc.iscas.ac.cn>
… deepseek models

Signed-off-by: Tianyang Liu <liutianyang@isrc.iscas.ac.cn>
Signed-off-by: Tianyang Liu <liutianyang@isrc.iscas.ac.cn>
Signed-off-by: Tianyang Liu <liutianyang@isrc.iscas.ac.cn>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates DeepSeek-OCR-2 into the vllm-ascend framework, enabling its use for advanced optical character recognition and document understanding on Ascend hardware. It addresses critical stability and compatibility issues encountered during the model's bring-up, such as graph compilation errors due to specific model configurations and conflicts with newer transformers library versions. The changes ensure reliable deployment and operation of DeepSeek-OCR-2, accompanied by detailed documentation and testing configurations.

Highlights

  • DeepSeek-OCR-2 Support: Introduced and validated support for the DeepSeek-OCR-2 vision-language model within the vllm-ascend framework.
  • Graph Compilation Fix: Resolved a graph compilation failure in QKNormRopeFusionPass by adding a check for non-positive rope_dim, which can occur with models like DeepSeek-OCR-2.
  • Transformers Compatibility: Implemented a compatibility patch for newer transformers library versions to address the deprecation of LlamaFlashAttention2, preventing import errors for DeepSeek models.
  • Documentation and Examples: Added comprehensive tutorial and documentation for DeepSeek-OCR-2, including environment setup, deployment, quick test scripts, API server instructions, and evaluation guidelines.
  • End-to-End Configuration: Included an end-to-end model configuration file for DeepSeek-OCR-2 to facilitate correctness validation and testing.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@liutianyang-2026 liutianyang-2026 changed the title [Feature] Verify and Support for DeepSeek-OCR-2 #6692 [Feature] Verify and Support for DeepSeek-OCR-2 Mar 26, 2026
@github-actions github-actions bot added documentation Improvements or additions to documentation module:tests labels Mar 26, 2026
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the DeepSeek-OCR-2 model, including a new tutorial document, an end-to-end test configuration, and a critical patch to resolve LlamaFlashAttention2 import errors in newer transformers versions. A robustness check for rope_dim was also added to the QKNorm and Rope fusion pass. Feedback includes correcting the trust_remote_code parameter to True in the tutorial's Python example and fixing a mislabeled code block from python to bash for proper syntax highlighting.

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@liutianyang-2026
Copy link
Copy Markdown
Author

liutianyang-2026 commented Mar 26, 2026

Local Test Results

  • e2e pytest:
    log.log截屏2026-03-25 22 21 07截屏2026-03-25 22 21 40
  • server test:截屏2026-03-25 22 24 47
  • performance test:截屏2026-03-25 22 29 17
  • CI/Lint test:
    截屏2026-03-26 11 02 58

@liutianyang-2026
Copy link
Copy Markdown
Author

SKILL: Adapt DeepSeek-OCR-2 (VLM) for vllm-ascend

Skill Overview

  • Skill Name: Adapt DeepSeek-OCR-2 (VLM) for vllm-ascend
  • Core Objective: Complete the end-to-end verification, dependency patching, compilation pass optimization, and CI evaluation integration for a multi-modal Large Language Model featuring complex trust_remote_code on Ascend NPU.
  • Related Domains: Multiprocessing Context Isolation, Framework-level Monkey Patching, VLM Baseline Evaluation, Graph Compilation Passes.

Workflows & Best Practices

1. Process Isolation for Remote Code Loading

  • The Pitfall: When trust_remote_code=True is enabled, loading custom HuggingFace code in the main process prematurely triggers PyTorch's C++ initialization. When vLLM subsequently spawns child processes (EngineCore) using the default fork method, the children inherit a polluted OpenMP thread pool state on ARM/Ascend architectures, leading to random Invalid thread pool crashes.
  • Best Practice: You must enforce a spawn multiprocessing method and strict OpenMP thread limits via environment variables before any heavy libraries (like vllm or torch) are imported.
    export VLLM_WORKER_MULTIPROC_METHOD=spawn
    export OMP_WAIT_POLICY=PASSIVE
    export OMP_NUM_THREADS=1

2. Upstream Dependency Patching via Platform Hooks

  • The Pitfall: Legacy remote code (e.g., DeepSeek's modeling_deepseekv2.py) may import classes that have been removed in newer versions of the transformers library (like LlamaFlashAttention2), causing an ImportError during AutoConfig loading.
  • Best Practice: Never modify the user's HuggingFace cache or the model's source code. Instead, utilize vllm-ascend's dynamic patch mechanism. Inject a dummy alias (e.g., mapping LlamaFlashAttention2 to LlamaAttention) in vllm_ascend/patch/platform/ to bypass the inheritance check during the earliest stage of the framework lifecycle.

3. Handling Anomalous Model Configurations in Graph Compilation

  • The Pitfall: DeepSeek-OCR-2 sets max_position_embeddings=0 in its configuration. This anomalous value cascades into the graph compilation phase, specifically causing rope_dim <= 0, which leads to a compilation failure in the QKNormRopeFusionPass (a pass designed to optimize QK-Norm and RoPE execution on NPU).
  • Best Practice: When adapting new models, always inspect their specific configuration anomalies (like zeroed-out position embeddings for pure vision/OCR tasks). Update the graph compilation passes to safely skip or handle invalid dimensions (e.g., if rope_dim <= 0: return) to prevent the entire compilation pipeline from breaking.

4. Efficient Client-Server Debugging & VLM Evaluation

  • The Pitfall: Mixing the evaluation harness (lm_eval) and the vLLM engine in a single process requires reloading tens of gigabytes of model weights for every configuration tweak, drastically slowing down debugging. Furthermore, lm_eval's dataset downloads can be unexpectedly hijacked by VLLM_USE_MODELSCOPE.
  • Best Practice:
    • Decoupled Architecture: Launch a persistent API Server (vllm.entrypoints.openai.api_server) in one terminal. Use lm_eval --model local-chat-completions as a lightweight client in another terminal to quickly iterate on VLM datasets like doc_vqa.
    • Baseline Establishment: Since base OCR models lack Chat Templates and tend to output verbose descriptions on zero-shot prompts, employ Contextual Learning (e.g., --num_fewshot 3) to regularize the output format. Accept a non-zero, deterministic score (even if relatively low) as a valid regression baseline for hardware execution correctness.

Effective Prompts for AI Collaboration

  • "The model runs correctly occasionally but crashes with 'Invalid thread pool' randomly. Could you analyze the C++ stack trace to determine if there's a race condition related to the fork mechanism when loading Remote Code?"
  • "If modifying the HuggingFace cache is prohibited, at which lifecycle stage of vllm-ascend (Worker or Platform) should I inject a dependency fix patch to intercept AutoConfig?"

liutianyang-2026 and others added 3 commits March 26, 2026 11:33
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: liutianyang-2026 <liutianyang@isrc.iscas.ac.cn>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: liutianyang-2026 <liutianyang@isrc.iscas.ac.cn>
Signed-off-by: liutianyang-2026 <liutianyang@isrc.iscas.ac.cn>
@liutianyang-2026 liutianyang-2026 marked this pull request as draft March 26, 2026 03:38
…heck

Signed-off-by: Tianyang Liu <liutianyang@isrc.iscas.ac.cn>
@liutianyang-2026 liutianyang-2026 marked this pull request as ready for review March 26, 2026 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation module:tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant