Verify determinism using model qwen2.5 by gongweibao · Pull Request #6476 · PaddlePaddle/FastDeploy

gongweibao · 2026-02-13T13:38:33Z

[Feature] Add Deterministic Inference Support

Motivation

Implement deterministic inference support for FastDeploy to ensure reproducible results across multiple runs. Deterministic inference is critical for:

Debugging and testing models
Reproducing results in production
Ensuring consistency in distributed inference scenarios

The implementation addresses non-determinism sources in:

All-Reduce operations in Tensor Parallelism (NCCL floating-point accumulation order)
Batch-invariant operations (matrix multiplication, log_softmax, mean)
Chunked Prefill alignment
FlashAttention backend
Sampling parameters seed management
Scheduler request stealing

Modifications

Core Implementation

File	Description
`fastdeploy/envs.py`	Added `FD_DETERMINISTIC_MODE` environment variable
`fastdeploy/__init__.py`	Auto-initialize custom all-reduce in deterministic mode
`fastdeploy/distributed/communication.py`	Add deterministic mode checks and custom all-reduce integration
`fastdeploy/engine/common_engine.py`	Add deterministic mode support
`fastdeploy/engine/sampling_params.py`	Add `deterministic` parameter for sampling
`fastdeploy/engine/sched/resource_manager_v1.py`	Add deterministic alignment logic
`fastdeploy/model_executor/layers/attention/flash_attn_backend.py`	Add deterministic mode support for FlashAttention
`fastdeploy/model_executor/layers/batch_invariant_ops/batch_invariant_ops.py`	Enhance batch-invariant operations
`fastdeploy/model_executor/models/qwen2.py`	Add deterministic support for Qwen2 model
`fastdeploy/scheduler/splitwise_scheduler.py`	Remove random request stealing in deterministic mode
`fastdeploy/worker/gpu_model_runner.py`	Add deterministic mode handling

Key Features

Custom All-Reduce for Deterministic TP: Forces custom all-reduce in deterministic mode with fixed accumulation order (unlike NCCL's dynamic algorithm)
Batch-Invariant Operations: Triton-based implementations for matmul, log_softmax, and mean
Chunked Prefill Alignment: Ensures truncation points align with split_kv_size integer multiples
Deterministic Sampling: Seed-based sampling for reproducible results
Error Handling: Explicit RuntimeErrors when deterministic requirements cannot be met

Usage or Command

Enable Deterministic Mode

export FD_DETERMINISTIC_MODE=1

Run Inference with Determinism

from fastdeploy import LLM
llm = LLM(...)
result = llm.generate(...)  # Automatically uses deterministic all-reduce

Run Tests

# All-reduce determinism test (requires 2+ GPUs)
python -m paddle.distributed.launch --gpus=0,1 tests/distributed/test_deterministic_all_reduce.py

# Batch-invariant operations test
python tests/batch_invariant_ops/test_batch_invariant_ops.py

# Sampling parameters determinism test
python tests/engine/test_sampling_params_determinism.py

Accuracy Tests

All unit tests pass:

Batch-invariant operations: 8 tests, 100% pass rate
Cache manager: 90 tests, 100% pass rate
Sampling parameters: 50 tests, 100% pass rate
Scheduler (local/dp): 42 tests, 100% pass rate
All-reduce determinism: Verified deterministic for float32/float16/bfloat16

Determinism Verification Results

======================================================================
Summary
======================================================================
Data Type       | Custom AR Deterministic   | NCCL Deterministic
----------------------------------------------------------------------
float32         | YES                      | NO
float16         | YES                      | NO
bfloat16        | YES                      | NO
======================================================================
Custom All-Reduce is deterministic for all supported types!
======================================================================

Checklist

Add at least a tag in the PR title.
Format your code, run pre-commit before commit.
Add unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch first.

CLAassistant · 2026-02-13T13:38:40Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

paddle-bot · 2026-02-13T13:38:45Z

Thanks for your contribution!

…manager Add comprehensive determinism tests for Paddle attention layer and refactor resource manager for deterministic mode support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

add

ed9e9dc

gongweibao had a problem deploying to Metax_ci February 13, 2026 13:38 — with GitHub Actions Failure

[tests] Add Paddle attention determinism tests and refactor resource …

4d849b8

…manager Add comprehensive determinism tests for Paddle attention layer and refactor resource manager for deterministic mode support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gongweibao had a problem deploying to Metax_ci February 13, 2026 14:14 — with GitHub Actions Failure

add

359fac9

gongweibao had a problem deploying to Metax_ci February 13, 2026 14:35 — with GitHub Actions Failure

add

390d7f5

gongweibao had a problem deploying to Metax_ci February 13, 2026 14:45 — with GitHub Actions Failure

add

89736c8

gongweibao had a problem deploying to Metax_ci February 13, 2026 14:58 — with GitHub Actions Failure

add

8338824

gongweibao had a problem deploying to Metax_ci February 13, 2026 15:01 — with GitHub Actions Failure

add more

957e876

gongweibao had a problem deploying to Metax_ci February 14, 2026 06:35 — with GitHub Actions Failure

add more

d5f3586

gongweibao had a problem deploying to Metax_ci February 14, 2026 09:51 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verify determinism using model qwen2.5#6476

Verify determinism using model qwen2.5#6476
gongweibao wants to merge 8 commits intoPaddlePaddle:developfrom
gongweibao:deter

gongweibao commented Feb 13, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Feb 13, 2026

Uh oh!

paddle-bot bot commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gongweibao commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[Feature] Add Deterministic Inference Support

Motivation

Modifications

Core Implementation

Key Features

Usage or Command

Enable Deterministic Mode

Run Inference with Determinism

Run Tests

Accuracy Tests

Determinism Verification Results

Checklist

Uh oh!

CLAassistant commented Feb 13, 2026

Uh oh!

paddle-bot bot commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gongweibao commented Feb 13, 2026 •

edited

Loading