-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Pull requests: deepspeedai/DeepSpeed
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix: remove premature MPI environment variable check in OpenMPIRunner
#7751
opened Dec 30, 2025 by
leejianwoo-collab
Loading…
feat: add parameter-level precision control for BF16 training
#7750
opened Dec 30, 2025 by
leejianwoo-collab
Loading…
Fix Muon optimizer checkpoint resume with bf16 mode
#7748
opened Dec 28, 2025 by
yurekami
Loading…
2 tasks done
fix(issue-7701): un-ignore .cuh under deepspeed/ops so multi_tensor_a…
#7739
opened Dec 20, 2025 by
leejianwoo-collab
Loading…
Introduce Megatron-style parallel state management
#7726
opened Dec 15, 2025 by
eternalNight
•
Draft
let allgather and alltoall execute in parallel when both attention and MOE used TP
#7723
opened Dec 11, 2025 by
taozhiwei
Loading…
Add single parameter allgather optimization for zero3
#7661
opened Oct 31, 2025 by
aeeeeeep
Loading…
HF2UCP: Converting a
pytorch_model.bin or .safetensors checkpoint to UCP
#7212
opened Apr 10, 2025 by
Schwidola0607
Loading…
[bugfix] update results of state_dict loading, embedding resizing to secondary partitions (hpz)
#7130
opened Mar 11, 2025 by
cyr0930
Loading…
Fix, pipeline model with moe cause error when send grad
#7055
opened Feb 19, 2025 by
wukong1992
Loading…
Add
pyproject.toml with legacy build backend to keep most logic in setup.py
#7033
opened Feb 13, 2025 by
loadams
Loading…
4 of 5 tasks
Enabled high-performance Automatic Tensor Parallelism (auto TP) for the MoE models on multiple GPUs/HPUs
#6964
opened Jan 21, 2025 by
gyou2021
Loading…
Previous Next
ProTip!
What’s not been updated in a month: updated:<2025-11-30.