Skip to content

Commit b4b921e

Browse files
committed
fix some links
1 parent 5702911 commit b4b921e

File tree

14 files changed

+29
-97
lines changed

14 files changed

+29
-97
lines changed

fern/fern/pages/backends/sglang/README.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,8 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
3636

3737
| Feature | SGLang | Notes |
3838
|---------|--------|-------|
39-
| [**Disaggregated Serving**](../../design_docs/disagg_serving) || |
40-
| [**Conditional Disaggregation**](../../design_docs/disagg_serving#conditional-disaggregation) | 🚧 | WIP [PR](https://github.com/sgl-project/sglang/pull/7730) |
39+
| [**Disaggregated Serving**](../../design-docs/disagg-serving) || |
40+
| [**Conditional Disaggregation**](../../design-docs/disagg-serving#conditional-disaggregation) | 🚧 | WIP [PR](https://github.com/sgl-project/sglang/pull/7730) |
4141
| [**KV-Aware Routing**](../../router/kv_cache_routing) || |
4242
| [**SLA-Based Planner**](../../planner/sla_planner) || |
4343
| [**Multimodal Support**](../../multimodal/sglang) || |
@@ -57,7 +57,7 @@ Dynamo SGLang uses SGLang's native argument parser, so **most SGLang engine argu
5757
| Argument | Description | Default | SGLang Equivalent |
5858
|----------|-------------|---------|-------------------|
5959
| `--endpoint` | Dynamo endpoint in `dyn://namespace.component.endpoint` format | Auto-generated based on mode | N/A |
60-
| `--migration-limit` | Max times a request can migrate between workers for fault tolerance. See [Request Migration Architecture](../../fault_tolerance/request_migration). | `0` (disabled) | N/A |
60+
| `--migration-limit` | Max times a request can migrate between workers for fault tolerance. See [Request Migration Architecture](../../fault-tolerance/request_migration). | `0` (disabled) | N/A |
6161
| `--dyn-tool-call-parser` | Tool call parser for structured outputs (takes precedence over `--tool-call-parser`) | `None` | `--tool-call-parser` |
6262
| `--dyn-reasoning-parser` | Reasoning parser for CoT models (takes precedence over `--reasoning-parser`) | `None` | `--reasoning-parser` |
6363
| `--use-sglang-tokenizer` | Use SGLang's tokenizer instead of Dynamo's | `False` | N/A |
@@ -87,7 +87,7 @@ When a user cancels a request (e.g., by disconnecting from the frontend), the re
8787
⚠️ SGLang backend currently does not support cancellation during remote prefill phase in disaggregated mode.
8888
</Callout>
8989

90-
For more details, see the [Request Cancellation Architecture](../../fault_tolerance/request_cancellation) documentation.
90+
For more details, see the [Request Cancellation Architecture](../../fault-tolerance/request_cancellation) documentation.
9191

9292
## Installation
9393

fern/fern/pages/backends/trtllm/README.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,8 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
4141

4242
| Feature | TensorRT-LLM | Notes |
4343
|---------|--------------|-------|
44-
| [**Disaggregated Serving**](../../design_docs/disagg_serving) || |
45-
| [**Conditional Disaggregation**](../../design_docs/disagg_serving#conditional-disaggregation) | 🚧 | Not supported yet |
44+
| [**Disaggregated Serving**](../../design-docs/disagg-serving) || |
45+
| [**Conditional Disaggregation**](../../design-docs/disagg-serving#conditional-disaggregation) | 🚧 | Not supported yet |
4646
| [**KV-Aware Routing**](../../router/kv_cache_routing) || |
4747
| [**SLA-Based Planner**](../../planner/sla_planner) || |
4848
| [**Load Based Planner**](../../planner/load_planner) | 🚧 | Planned |
@@ -178,7 +178,7 @@ Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disag
178178

179179
## Request Migration
180180

181-
You can enable [request migration](../../fault_tolerance/request_migration) to handle worker failures gracefully. Use the `--migration-limit` flag to specify how many times a request can be migrated to another worker:
181+
You can enable [request migration](../../fault-tolerance/request_migration) to handle worker failures gracefully. Use the `--migration-limit` flag to specify how many times a request can be migrated to another worker:
182182

183183
```bash
184184
# For decode and aggregated workers
@@ -189,7 +189,7 @@ python3 -m dynamo.trtllm ... --migration-limit=3
189189
**Prefill workers do not support request migration** and must use `--migration-limit=0` (the default). Prefill workers only process prompts and return KV cache state - they don't maintain long-running generation requests that would benefit from migration.
190190
</Callout>
191191

192-
See the [Request Migration Architecture](../../fault_tolerance/request_migration) documentation for details on how this works.
192+
See the [Request Migration Architecture](../../fault-tolerance/request_migration) documentation for details on how this works.
193193

194194
## Request Cancellation
195195

@@ -202,7 +202,7 @@ When a user cancels a request (e.g., by disconnecting from the frontend), the re
202202
| **Aggregated** |||
203203
| **Disaggregated** |||
204204

205-
For more details, see the [Request Cancellation Architecture](../../fault_tolerance/request_cancellation) documentation.
205+
For more details, see the [Request Cancellation Architecture](../../fault-tolerance/request_cancellation) documentation.
206206

207207
## Client
208208

fern/fern/pages/backends/vllm/README.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
3737

3838
| Feature | vLLM | Notes |
3939
|---------|------|-------|
40-
| [**Disaggregated Serving**](../../design_docs/disagg_serving) || |
41-
| [**Conditional Disaggregation**](../../design_docs/disagg_serving#conditional-disaggregation) | 🚧 | WIP |
40+
| [**Disaggregated Serving**](../../design-docs/disagg-serving) || |
41+
| [**Conditional Disaggregation**](../../design-docs/disagg-serving#conditional-disaggregation) | 🚧 | WIP |
4242
| [**KV-Aware Routing**](../../router/kv_cache_routing) || |
4343
| [**SLA-Based Planner**](../../planner/sla_planner) || |
4444
| [**Load Based Planner**](../../planner/load_planner) | 🚧 | WIP |
@@ -180,13 +180,13 @@ See the high-level notes in [KV Cache Routing](../../router/kv_cache_routing) on
180180

181181
## Request Migration
182182

183-
You can enable [request migration](../../fault_tolerance/request_migration) to handle worker failures gracefully. Use the `--migration-limit` flag to specify how many times a request can be migrated to another worker:
183+
You can enable [request migration](../../fault-tolerance/request_migration) to handle worker failures gracefully. Use the `--migration-limit` flag to specify how many times a request can be migrated to another worker:
184184

185185
```bash
186186
python3 -m dynamo.vllm ... --migration-limit=3
187187
```
188188

189-
This allows a request to be migrated up to 3 times before failing. See the [Request Migration Architecture](../../fault_tolerance/request_migration) documentation for details on how this works.
189+
This allows a request to be migrated up to 3 times before failing. See the [Request Migration Architecture](../../fault-tolerance/request_migration) documentation for details on how this works.
190190

191191
## Request Cancellation
192192

@@ -199,4 +199,4 @@ When a user cancels a request (e.g., by disconnecting from the frontend), the re
199199
| **Aggregated** |||
200200
| **Disaggregated** |||
201201

202-
For more details, see the [Request Cancellation Architecture](../../fault_tolerance/request_cancellation) documentation.
202+
For more details, see the [Request Cancellation Architecture](../../fault-tolerance/request_cancellation) documentation.

fern/fern/pages/design-docs/architecture.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ To address the growing demands of distributed inference serving, NVIDIA introduc
4141

4242
The following diagram outlines Dynamo's high-level architecture. To enable large-scale distributed and disaggregated inference serving, Dynamo includes five key features:
4343

44-
- [Dynamo Disaggregated Serving](disagg_serving)
44+
- [Dynamo Disaggregated Serving](disagg-serving)
4545
- [Dynamo Smart Router](../router/kv_cache_routing)
4646
- [Dynamo KV Cache Block Manager](../kvbm/kvbm_intro)
4747
- [Planner](../planner/planner_intro)

fern/fern/pages/development/backend-guide.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ The `model_type` can be:
7474
- `model_name`: The name to call the model. Your incoming HTTP requests model name must match this. Defaults to the hugging face repo name or the folder name.
7575
- `context_length`: Max model length in tokens. Defaults to the model's set max. Only set this if you need to reduce KV cache allocation to fit into VRAM.
7676
- `kv_cache_block_size`: Size of a KV block for the engine, in tokens. Defaults to 16.
77-
- `migration_limit`: Maximum number of times a request may be [migrated to another Instance](../fault_tolerance/request_migration). Defaults to 0.
77+
- `migration_limit`: Maximum number of times a request may be [migrated to another Instance](../fault-tolerance/request_migration). Defaults to 0.
7878
- `user_data`: Optional dictionary containing custom metadata for worker behavior (e.g., LoRA configuration). Defaults to None.
7979

8080
See `examples/backends` for full code examples.
@@ -116,7 +116,7 @@ In the P/D disaggregated setup you would have `deepseek-distill-llama8b.prefill.
116116

117117
A Python worker may need to be shut down promptly, for example when the node running the worker is to be reclaimed and there isn't enough time to complete all ongoing requests before the shutdown deadline.
118118

119-
In such cases, you can signal incomplete responses by raising a `GeneratorExit` exception in your generate loop. This will immediately close the response stream, signaling to the frontend that the stream is incomplete. With request migration enabled (see the [`migration_limit`](../fault_tolerance/request_migration) parameter), the frontend will automatically migrate the partially completed request to another worker instance, if available, to be completed.
119+
In such cases, you can signal incomplete responses by raising a `GeneratorExit` exception in your generate loop. This will immediately close the response stream, signaling to the frontend that the stream is incomplete. With request migration enabled (see the [`migration_limit`](../fault-tolerance/request_migration) parameter), the frontend will automatically migrate the partially completed request to another worker instance, if available, to be completed.
120120

121121
<Callout intent="warning">
122122
We will update the `GeneratorExit` exception to a new Dynamo exception. Please expect minor code breaking change in the near future.
@@ -140,7 +140,7 @@ class RequestHandler:
140140

141141
When `GeneratorExit` is raised, the frontend receives the incomplete response and can seamlessly continue generation on another available worker instance, preserving the user experience even during worker shutdowns.
142142

143-
For more information about how request migration works, see the [Request Migration Architecture](../fault_tolerance/request_migration) documentation.
143+
For more information about how request migration works, see the [Request Migration Architecture](../fault-tolerance/request_migration) documentation.
144144

145145
## Request Cancellation
146146

@@ -162,4 +162,4 @@ class RequestHandler:
162162

163163
The context parameter is optional - if your generate method doesn't include it in its signature, Dynamo will call your method without the context argument.
164164

165-
For detailed information about request cancellation, including async cancellation monitoring and context propagation patterns, see the [Request Cancellation Architecture](../fault_tolerance/request_cancellation) documentation.
165+
For detailed information about request cancellation, including async cancellation monitoring and context propagation patterns, see the [Request Cancellation Architecture](../fault-tolerance/request_cancellation) documentation.

fern/fern/pages/getting-started/examples.mdx

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,3 @@ cd examples/backends/sglang
5151

5252
# Follow the README in each example directory
5353
```
54-
55-
## Next Steps
56-
57-
- See the [Backends documentation](./backends/vllm/README) for detailed backend configuration
58-
- Check [Kubernetes Deployment](./kubernetes/README) for production deployments
59-
- Review [User Guides](./agents/tool-calling) for advanced features

fern/fern/pages/getting-started/installation.mdx

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -46,9 +46,3 @@ docker run --rm -it \
4646
--network host \
4747
nvcr.io/nvidia/ai-dynamo/sglang-runtime:latest # or vllm, tensorrtllm
4848
```
49-
50-
## Next Steps
51-
52-
- Check the [Support Matrix](./reference/support-matrix) for compatible versions
53-
- Try the [Examples](./examples) to see Dynamo in action
54-
- Deploy on [Kubernetes](./kubernetes/README) for production workloads

fern/fern/pages/getting-started/intro.mdx

Lines changed: 0 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -70,31 +70,3 @@ curl localhost:8000/v1/chat/completions \
7070
| **KV Cache Routing** | Intelligent request routing based on KV cache state |
7171
| **Kubernetes Native** | Full operator and Helm chart support |
7272
| **Observability** | Prometheus metrics, Grafana dashboards, and tracing |
73-
74-
## Documentation Overview
75-
76-
### Backends
77-
- [vLLM Backend](./backends/vllm/README) - High-throughput serving with vLLM
78-
- [SGLang Backend](./backends/sglang/README) - Fast inference with SGLang
79-
- [TensorRT-LLM Backend](./backends/trtllm/README) - Optimized inference with TensorRT-LLM
80-
81-
### Kubernetes Deployment
82-
- [Installation Guide](./kubernetes/installation_guide) - Deploy Dynamo on Kubernetes
83-
- [Operator Guide](./kubernetes/dynamo_operator) - Using the Dynamo Operator
84-
- [Autoscaling](./kubernetes/autoscaling) - Automatic scaling configuration
85-
86-
### Architecture
87-
- [System Architecture](./design-docs/architecture) - Overall system design
88-
- [Disaggregated Serving](./design-docs/disagg-serving) - P/D separation architecture
89-
- [Distributed Runtime](./design-docs/distributed_runtime) - Runtime internals
90-
91-
### Performance & Tuning
92-
- [Performance Tuning](./performance/tuning) - Optimize your deployment
93-
- [Benchmarking](./benchmarks/benchmarking) - Measure and compare performance
94-
- [AI Configurator](./performance/aiconfigurator) - Automated configuration
95-
96-
## Getting Help
97-
98-
- **GitHub Issues**: [Report bugs or request features](https://github.com/ai-dynamo/dynamo/issues)
99-
- **Discussions**: [Ask questions and share ideas](https://github.com/ai-dynamo/dynamo/discussions)
100-
- **Reference**: [CLI Reference](./reference/cli) | [Glossary](./reference/glossary) | [Support Matrix](./reference/support-matrix)

fern/fern/pages/getting-started/quickstart.mdx

Lines changed: 0 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -70,31 +70,3 @@ curl localhost:8000/v1/chat/completions \
7070
| **KV Cache Routing** | Intelligent request routing based on KV cache state |
7171
| **Kubernetes Native** | Full operator and Helm chart support |
7272
| **Observability** | Prometheus metrics, Grafana dashboards, and tracing |
73-
74-
## Documentation Overview
75-
76-
### Backends
77-
- [vLLM Backend](./backends/vllm/README) - High-throughput serving with vLLM
78-
- [SGLang Backend](./backends/sglang/README) - Fast inference with SGLang
79-
- [TensorRT-LLM Backend](./backends/trtllm/README) - Optimized inference with TensorRT-LLM
80-
81-
### Kubernetes Deployment
82-
- [Installation Guide](./kubernetes/installation_guide) - Deploy Dynamo on Kubernetes
83-
- [Operator Guide](./kubernetes/dynamo_operator) - Using the Dynamo Operator
84-
- [Autoscaling](./kubernetes/autoscaling) - Automatic scaling configuration
85-
86-
### Architecture
87-
- [System Architecture](./design-docs/architecture) - Overall system design
88-
- [Disaggregated Serving](./design-docs/disagg-serving) - P/D separation architecture
89-
- [Distributed Runtime](./design-docs/distributed_runtime) - Runtime internals
90-
91-
### Performance & Tuning
92-
- [Performance Tuning](./performance/tuning) - Optimize your deployment
93-
- [Benchmarking](./benchmarks/benchmarking) - Measure and compare performance
94-
- [AI Configurator](./performance/aiconfigurator) - Automated configuration
95-
96-
## Getting Help
97-
98-
- **GitHub Issues**: [Report bugs or request features](https://github.com/ai-dynamo/dynamo/issues)
99-
- **Discussions**: [Ask questions and share ideas](https://github.com/ai-dynamo/dynamo/discussions)
100-
- **Reference**: [CLI Reference](./reference/cli) | [Glossary](./reference/glossary) | [Support Matrix](./reference/support-matrix)

fern/fern/pages/observability/health-checks.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -341,6 +341,6 @@ ERROR Health check request failed for generate: connection refused
341341

342342
## Related Documentation
343343

344-
- [Distributed Runtime Architecture](../design_docs/distributed_runtime)
345-
- [Dynamo Architecture Overview](../design_docs/architecture)
344+
- [Distributed Runtime Architecture](../design-docs/distributed_runtime)
345+
- [Dynamo Architecture Overview](../design-docs/architecture)
346346
- [Backend Guide](../development/backend-guide)

0 commit comments

Comments
 (0)