Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/guides/ft-launcher-guide.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Fault Tolerance Launcher Guide

The `ft_launcher` is provided by `nvidia-resiliency-ext` (included in NeMo RL dependencies) and enables automatic fault tolerance and recovery for distributed training runs.
The `ft_launcher` is provided by `nvidia-resiliency-ext` (available via the `nvrx` optional extra, e.g. `uv run --extra nvrx ft_launcher ...`) and enables automatic fault tolerance and recovery for distributed training runs.

## Key Arguments

Expand All @@ -14,7 +14,7 @@ The `ft_launcher` is provided by `nvidia-resiliency-ext` (included in NeMo RL de
## Basic Usage

```bash
uv run ft_launcher \
uv run --extra nvrx ft_launcher \
--ft-cfg-path examples/ft_launcher/ft_config.yaml \
--ft-rank-heartbeat-timeout 450 \
--ft-initial-rank-heartbeat-timeout 1200 \
Expand Down
4 changes: 3 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@ dependencies = [
"cuda-bindings", # for non-colocated refit
"pybase64", # for sglang refit
"nvidia-cudnn-cu12==9.19.0.56", # for transformer-engine no build isolation
# nvidia-resiliency-ext removed: no Python 3.13 wheels available (v0.5.0 only has cp310-cp312)
]

[project.optional-dependencies]
Expand Down Expand Up @@ -109,6 +108,9 @@ mcore = [
"emerging-optimizers==0.2.0",
"deep_ep @ git+https://github.com/deepseek-ai/DeepEP.git@bfded34800dfec415b71503f8205181de90b2480",
]
nvrx = [
"nvidia-resiliency-ext",
] # for ft_launcher (fault-tolerant training launcher)
nemo_gym = ["nemo_gym"]

[dependency-groups]
Expand Down
6 changes: 5 additions & 1 deletion uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading