Skip to content

Verify determinism using model qwen2.5#6476

Open
gongweibao wants to merge 8 commits intoPaddlePaddle:developfrom
gongweibao:deter
Open

Verify determinism using model qwen2.5#6476
gongweibao wants to merge 8 commits intoPaddlePaddle:developfrom
gongweibao:deter

Conversation

@gongweibao
Copy link

@gongweibao gongweibao commented Feb 13, 2026

[Feature] Add Deterministic Inference Support

Motivation

Implement deterministic inference support for FastDeploy to ensure reproducible results across multiple runs. Deterministic inference is critical for:

  • Debugging and testing models
  • Reproducing results in production
  • Ensuring consistency in distributed inference scenarios

The implementation addresses non-determinism sources in:

  1. All-Reduce operations in Tensor Parallelism (NCCL floating-point accumulation order)
  2. Batch-invariant operations (matrix multiplication, log_softmax, mean)
  3. Chunked Prefill alignment
  4. FlashAttention backend
  5. Sampling parameters seed management
  6. Scheduler request stealing

Modifications

Core Implementation

File Description
fastdeploy/envs.py Added FD_DETERMINISTIC_MODE environment variable
fastdeploy/__init__.py Auto-initialize custom all-reduce in deterministic mode
fastdeploy/distributed/communication.py Add deterministic mode checks and custom all-reduce integration
fastdeploy/engine/common_engine.py Add deterministic mode support
fastdeploy/engine/sampling_params.py Add deterministic parameter for sampling
fastdeploy/engine/sched/resource_manager_v1.py Add deterministic alignment logic
fastdeploy/model_executor/layers/attention/flash_attn_backend.py Add deterministic mode support for FlashAttention
fastdeploy/model_executor/layers/batch_invariant_ops/batch_invariant_ops.py Enhance batch-invariant operations
fastdeploy/model_executor/models/qwen2.py Add deterministic support for Qwen2 model
fastdeploy/scheduler/splitwise_scheduler.py Remove random request stealing in deterministic mode
fastdeploy/worker/gpu_model_runner.py Add deterministic mode handling

Key Features

  1. Custom All-Reduce for Deterministic TP: Forces custom all-reduce in deterministic mode with fixed accumulation order (unlike NCCL's dynamic algorithm)
  2. Batch-Invariant Operations: Triton-based implementations for matmul, log_softmax, and mean
  3. Chunked Prefill Alignment: Ensures truncation points align with split_kv_size integer multiples
  4. Deterministic Sampling: Seed-based sampling for reproducible results
  5. Error Handling: Explicit RuntimeErrors when deterministic requirements cannot be met

Usage or Command

Enable Deterministic Mode

export FD_DETERMINISTIC_MODE=1

Run Inference with Determinism

from fastdeploy import LLM
llm = LLM(...)
result = llm.generate(...)  # Automatically uses deterministic all-reduce

Run Tests

# All-reduce determinism test (requires 2+ GPUs)
python -m paddle.distributed.launch --gpus=0,1 tests/distributed/test_deterministic_all_reduce.py

# Batch-invariant operations test
python tests/batch_invariant_ops/test_batch_invariant_ops.py

# Sampling parameters determinism test
python tests/engine/test_sampling_params_determinism.py

Accuracy Tests

All unit tests pass:

  • Batch-invariant operations: 8 tests, 100% pass rate
  • Cache manager: 90 tests, 100% pass rate
  • Sampling parameters: 50 tests, 100% pass rate
  • Scheduler (local/dp): 42 tests, 100% pass rate
  • All-reduce determinism: Verified deterministic for float32/float16/bfloat16

Determinism Verification Results

======================================================================
Summary
======================================================================
Data Type       | Custom AR Deterministic   | NCCL Deterministic
----------------------------------------------------------------------
float32         | YES                      | NO
float16         | YES                      | NO
bfloat16        | YES                      | NO
======================================================================
Custom All-Reduce is deterministic for all supported types!
======================================================================

Checklist

  • Add at least a tag in the PR title.
  • Format your code, run pre-commit before commit.
  • Add unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch first.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot
Copy link

paddle-bot bot commented Feb 13, 2026

Thanks for your contribution!

…manager

Add comprehensive determinism tests for Paddle attention layer and refactor
resource manager for deterministic mode support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants