Skip to content

Commit 53da855

Browse files
committed
Fixed some of the links to use slug paths
Signed-off-by: Jont828 <jt572@cornell.edu>
1 parent b4b921e commit 53da855

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+139
-139
lines changed

fern/fern/pages/api/nixl_connect/README.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ When RDMA isn't available, the NIXL data transfer will still complete using non-
9494

9595
### Multimodal Example
9696

97-
In the case of the [Dynamo Multimodal Disaggregated Example](../../multimodal/vllm):
97+
In the case of the [Dynamo Multimodal Disaggregated Example](/additional-resources/multimodal-details/vllm):
9898

9999
1. The HTTP frontend accepts a text prompt and a URL to an image.
100100

fern/fern/pages/backends/sglang/README.mdx

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,12 +36,12 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
3636

3737
| Feature | SGLang | Notes |
3838
|---------|--------|-------|
39-
| [**Disaggregated Serving**](../../design-docs/disagg-serving) || |
40-
| [**Conditional Disaggregation**](../../design-docs/disagg-serving#conditional-disaggregation) | 🚧 | WIP [PR](https://github.com/sgl-project/sglang/pull/7730) |
41-
| [**KV-Aware Routing**](../../router/kv_cache_routing) || |
42-
| [**SLA-Based Planner**](../../planner/sla_planner) || |
43-
| [**Multimodal Support**](../../multimodal/sglang) || |
44-
| [**KVBM**](../../kvbm/kvbm_architecture) || Planned |
39+
| [**Disaggregated Serving**](/design-docs/disaggregated-serving) || |
40+
| [**Conditional Disaggregation**](/design-docs/disaggregated-serving#conditional-disaggregation) | 🚧 | WIP [PR](https://github.com/sgl-project/sglang/pull/7730) |
41+
| [**KV-Aware Routing**](/additional-resources/router-details/kv-cache-routing) || |
42+
| [**SLA-Based Planner**](/components/planner/sla-based-planner) || |
43+
| [**Multimodal Support**](/additional-resources/multimodal-details/sglang) || |
44+
| [**KVBM**](/components/kvbm/architecture) || Planned |
4545

4646

4747
## Dynamo SGLang Integration
@@ -57,7 +57,7 @@ Dynamo SGLang uses SGLang's native argument parser, so **most SGLang engine argu
5757
| Argument | Description | Default | SGLang Equivalent |
5858
|----------|-------------|---------|-------------------|
5959
| `--endpoint` | Dynamo endpoint in `dyn://namespace.component.endpoint` format | Auto-generated based on mode | N/A |
60-
| `--migration-limit` | Max times a request can migrate between workers for fault tolerance. See [Request Migration Architecture](../../fault-tolerance/request_migration). | `0` (disabled) | N/A |
60+
| `--migration-limit` | Max times a request can migrate between workers for fault tolerance. See [Request Migration Architecture](/additional-resources/fault-tolerance/request-migration). | `0` (disabled) | N/A |
6161
| `--dyn-tool-call-parser` | Tool call parser for structured outputs (takes precedence over `--tool-call-parser`) | `None` | `--tool-call-parser` |
6262
| `--dyn-reasoning-parser` | Reasoning parser for CoT models (takes precedence over `--reasoning-parser`) | `None` | `--reasoning-parser` |
6363
| `--use-sglang-tokenizer` | Use SGLang's tokenizer instead of Dynamo's | `False` | N/A |
@@ -87,7 +87,7 @@ When a user cancels a request (e.g., by disconnecting from the frontend), the re
8787
⚠️ SGLang backend currently does not support cancellation during remote prefill phase in disaggregated mode.
8888
</Callout>
8989

90-
For more details, see the [Request Cancellation Architecture](../../fault-tolerance/request_cancellation) documentation.
90+
For more details, see the [Request Cancellation Architecture](/additional-resources/fault-tolerance/request-cancellation) documentation.
9191

9292
## Installation
9393

fern/fern/pages/backends/sglang/prometheus.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@ When running SGLang through Dynamo, SGLang engine metrics are automatically pass
1313

1414
**For the complete and authoritative list of all SGLang metrics**, always refer to the [official SGLang Production Metrics documentation](https://docs.sglang.ai/references/production_metrics.html).
1515

16-
**For Dynamo runtime metrics**, see the [Dynamo Metrics Guide](../../observability/metrics).
16+
**For Dynamo runtime metrics**, see the [Dynamo Metrics Guide](/user-guides/observability-local/metrics).
1717

18-
**For visualization setup instructions**, see the [Prometheus and Grafana Setup Guide](../../observability/prometheus-grafana).
18+
**For visualization setup instructions**, see the [Prometheus and Grafana Setup Guide](/user-guides/observability-local/prometheus-grafana).
1919

2020
## Environment Variables
2121

@@ -29,7 +29,7 @@ This is a single machine example.
2929

3030
### Start Observability Stack
3131

32-
For visualizing metrics with Prometheus and Grafana, start the observability stack. See [Observability Getting Started](../../observability/README#getting-started-quickly) for instructions.
32+
For visualizing metrics with Prometheus and Grafana, start the observability stack. See [Observability Getting Started](/user-guides/observability-local/overview#getting-started-quickly) for instructions.
3333

3434
### Launch Dynamo Components
3535

@@ -117,8 +117,8 @@ For the complete and authoritative list of all SGLang metrics, see the [official
117117
- [SGLang GitHub - Metrics Collector](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/metrics/collector.py)
118118

119119
### Dynamo Metrics
120-
- [Dynamo Metrics Guide](../../observability/metrics) - Complete documentation on Dynamo runtime metrics
121-
- [Prometheus and Grafana Setup](../../observability/prometheus-grafana) - Visualization setup instructions
120+
- [Dynamo Metrics Guide](/user-guides/observability-local/metrics) - Complete documentation on Dynamo runtime metrics
121+
- [Prometheus and Grafana Setup](/user-guides/observability-local/prometheus-grafana) - Visualization setup instructions
122122
- Dynamo runtime metrics (prefixed with `dynamo_*`) are available at the same `/metrics` endpoint alongside SGLang metrics
123123
- Implementation: `lib/runtime/src/metrics.rs` (Rust runtime metrics)
124124
- Metric names: `lib/runtime/src/metrics/prometheus_names.rs` (metric name constants)

fern/fern/pages/backends/trtllm/README.mdx

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -41,12 +41,12 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
4141

4242
| Feature | TensorRT-LLM | Notes |
4343
|---------|--------------|-------|
44-
| [**Disaggregated Serving**](../../design-docs/disagg-serving) || |
45-
| [**Conditional Disaggregation**](../../design-docs/disagg-serving#conditional-disaggregation) | 🚧 | Not supported yet |
46-
| [**KV-Aware Routing**](../../router/kv_cache_routing) || |
47-
| [**SLA-Based Planner**](../../planner/sla_planner) || |
48-
| [**Load Based Planner**](../../planner/load_planner) | 🚧 | Planned |
49-
| [**KVBM**](../../kvbm/kvbm_architecture) || |
44+
| [**Disaggregated Serving**](/design-docs/disaggregated-serving) || |
45+
| [**Conditional Disaggregation**](/design-docs/disaggregated-serving#conditional-disaggregation) | 🚧 | Not supported yet |
46+
| [**KV-Aware Routing**](/additional-resources/router-details/kv-cache-routing) || |
47+
| [**SLA-Based Planner**](/components/planner/sla-based-planner) || |
48+
| [**Load Based Planner**](/additional-resources/load-planner) | 🚧 | Planned |
49+
| [**KVBM**](/components/kvbm/architecture) || |
5050

5151
### Large Scale P/D and WideEP Features
5252

@@ -98,7 +98,7 @@ apt-get update && apt-get -y install git git-lfs
9898
Below we provide some simple shell scripts that run the components for each configuration. Each shell script is simply running the `python3 -m dynamo.frontend <args>` to start up the ingress and using `python3 -m dynamo.trtllm <args>` to start up the workers. You can easily take each command and run them in separate terminals.
9999
</Callout>
100100

101-
For detailed information about the architecture and how KV-aware routing works, see the [KV Cache Routing documentation](../../router/kv_cache_routing).
101+
For detailed information about the architecture and how KV-aware routing works, see the [KV Cache Routing documentation](/additional-resources/router-details/kv-cache-routing).
102102

103103
### Aggregated
104104
```bash
@@ -151,7 +151,7 @@ Below we provide a selected list of advanced examples. Please open up an issue i
151151

152152
### Multinode Deployment
153153

154-
For comprehensive instructions on multinode serving, see the [multinode-examples.md](./multinode/multinode-examples) guide. It provides step-by-step deployment examples and configuration tips for running Dynamo with TensorRT-LLM across multiple nodes. While the walkthrough uses DeepSeek-R1 as the model, you can easily adapt the process for any supported model by updating the relevant configuration files. You can see [Llama4+eagle](./llama4_plus_eagle) guide to learn how to use these scripts when a single worker fits on the single node.
154+
For comprehensive instructions on multinode serving, see the [multinode-examples.md](/additional-resources/backend-details/tensorrt-llm/multinode-examples) guide. It provides step-by-step deployment examples and configuration tips for running Dynamo with TensorRT-LLM across multiple nodes. While the walkthrough uses DeepSeek-R1 as the model, you can easily adapt the process for any supported model by updating the relevant configuration files. You can see [Llama4+eagle](/additional-resources/backend-details/tensorrt-llm/llama-4-eagle) guide to learn how to use these scripts when a single worker fits on the single node.
155155

156156
### Speculative Decoding
157157
- **[Llama 4 Maverick Instruct + Eagle Speculative Decoding](./llama4_plus_eagle)**
@@ -162,7 +162,7 @@ For complete Kubernetes deployment instructions, configurations, and troubleshoo
162162

163163
### Client
164164

165-
See [client](../../backends/sglang/README#testing-the-deployment) section to learn how to send request to the deployment.
165+
See [client](/components/backends/sglang#testing-the-deployment) section to learn how to send request to the deployment.
166166

167167
NOTE: To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend <args>`.
168168

@@ -178,7 +178,7 @@ Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disag
178178

179179
## Request Migration
180180

181-
You can enable [request migration](../../fault-tolerance/request_migration) to handle worker failures gracefully. Use the `--migration-limit` flag to specify how many times a request can be migrated to another worker:
181+
You can enable [request migration](/additional-resources/fault-tolerance/request-migration) to handle worker failures gracefully. Use the `--migration-limit` flag to specify how many times a request can be migrated to another worker:
182182

183183
```bash
184184
# For decode and aggregated workers
@@ -189,7 +189,7 @@ python3 -m dynamo.trtllm ... --migration-limit=3
189189
**Prefill workers do not support request migration** and must use `--migration-limit=0` (the default). Prefill workers only process prompts and return KV cache state - they don't maintain long-running generation requests that would benefit from migration.
190190
</Callout>
191191

192-
See the [Request Migration Architecture](../../fault-tolerance/request_migration) documentation for details on how this works.
192+
See the [Request Migration Architecture](/additional-resources/fault-tolerance/request-migration) documentation for details on how this works.
193193

194194
## Request Cancellation
195195

@@ -202,11 +202,11 @@ When a user cancels a request (e.g., by disconnecting from the frontend), the re
202202
| **Aggregated** |||
203203
| **Disaggregated** |||
204204

205-
For more details, see the [Request Cancellation Architecture](../../fault-tolerance/request_cancellation) documentation.
205+
For more details, see the [Request Cancellation Architecture](/additional-resources/fault-tolerance/request-cancellation) documentation.
206206

207207
## Client
208208

209-
See [client](../../backends/sglang/README#testing-the-deployment) section to learn how to send request to the deployment.
209+
See [client](/components/backends/sglang#testing-the-deployment) section to learn how to send request to the deployment.
210210

211211
NOTE: To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend <args>`.
212212

@@ -217,7 +217,7 @@ To benchmark your deployment with AIPerf, see this utility script, configuring t
217217

218218
## Multimodal support
219219

220-
Dynamo with the TensorRT-LLM backend supports multimodal models, enabling you to process both text and images (or pre-computed embeddings) in a single request. For detailed setup instructions, example requests, and best practices, see the [TensorRT-LLM Multimodal Guide](../../multimodal/trtllm).
220+
Dynamo with the TensorRT-LLM backend supports multimodal models, enabling you to process both text and images (or pre-computed embeddings) in a single request. For detailed setup instructions, example requests, and best practices, see the [TensorRT-LLM Multimodal Guide](/additional-resources/multimodal-details/tensorrt-llm).
221221

222222
## Logits Processing
223223

fern/fern/pages/backends/trtllm/llama4_plus_eagle.mdx

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ title: "Llama 4 Maverick Instruct with Eagle Speculative Decoding on SLURM"
77
SPDX-License-Identifier: Apache-2.0
88
*/}
99

10-
This guide demonstrates how to deploy Llama 4 Maverick Instruct with Eagle Speculative Decoding on GB200x4 nodes. We will be following the [multi-node deployment instructions](./multinode/multinode-examples) to set up the environment for the following scenarios:
10+
This guide demonstrates how to deploy Llama 4 Maverick Instruct with Eagle Speculative Decoding on GB200x4 nodes. We will be following the [multi-node deployment instructions](/additional-resources/backend-details/tensorrt-llm/multinode-examples) to set up the environment for the following scenarios:
1111

1212
- **Aggregated Serving:**
1313
Deploy the entire Llama 4 model on a single GB200x4 node for end-to-end serving.
@@ -36,7 +36,7 @@ export MODEL_PATH="nvidia/Llama-4-Maverick-17B-128E-Instruct-FP8"
3636
export SERVED_MODEL_NAME="nvidia/Llama-4-Maverick-17B-128E-Instruct-FP8"
3737
```
3838

39-
See [this](./multinode/multinode-examples#setup) section from multinode guide to learn more about the above options.
39+
See [this](/additional-resources/backend-details/tensorrt-llm/multinode-examples#setup) section from multinode guide to learn more about the above options.
4040

4141

4242
## Aggregated Serving
@@ -58,7 +58,7 @@ export DECODE_ENGINE_CONFIG="/mnt/examples/backends/trtllm/engine_configs/llama4
5858

5959
## Example Request
6060

61-
See [here](./multinode/multinode-examples#example-request) to learn how to send a request to the deployment.
61+
See [here](/additional-resources/backend-details/tensorrt-llm/multinode-examples#example-request) to learn how to send a request to the deployment.
6262

6363
```
6464
curl localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{

fern/fern/pages/backends/trtllm/prometheus.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@ Additional performance metrics are available via non-Prometheus APIs (see [Non-P
1515

1616
As of the date of this documentation, the included TensorRT-LLM version 1.1.0rc5 exposes **5 basic Prometheus metrics**. Note that the `trtllm_` prefix is added by Dynamo.
1717

18-
**For Dynamo runtime metrics**, see the [Dynamo Metrics Guide](../../observability/metrics).
18+
**For Dynamo runtime metrics**, see the [Dynamo Metrics Guide](/user-guides/observability-local/metrics).
1919

20-
**For visualization setup instructions**, see the [Prometheus and Grafana Setup Guide](../../observability/prometheus-grafana).
20+
**For visualization setup instructions**, see the [Prometheus and Grafana Setup Guide](/user-guides/observability-local/prometheus-grafana).
2121

2222
## Environment Variables
2323

@@ -31,7 +31,7 @@ This is a single machine example.
3131

3232
### Start Observability Stack
3333

34-
For visualizing metrics with Prometheus and Grafana, start the observability stack. See [Observability Getting Started](../../observability/README#getting-started-quickly) for instructions.
34+
For visualizing metrics with Prometheus and Grafana, start the observability stack. See [Observability Getting Started](/user-guides/observability-local/overview#getting-started-quickly) for instructions.
3535

3636
### Launch Dynamo Components
3737

@@ -187,8 +187,8 @@ TensorRT-LLM provides extensive performance data beyond the basic Prometheus met
187187
- [TensorRT-LLM Metrics Collector](https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/metrics/collector.py) - Source code reference
188188

189189
### Dynamo Metrics
190-
- [Dynamo Metrics Guide](../../observability/metrics) - Complete documentation on Dynamo runtime metrics
191-
- [Prometheus and Grafana Setup](../../observability/prometheus-grafana) - Visualization setup instructions
190+
- [Dynamo Metrics Guide](/user-guides/observability-local/metrics) - Complete documentation on Dynamo runtime metrics
191+
- [Prometheus and Grafana Setup](/user-guides/observability-local/prometheus-grafana) - Visualization setup instructions
192192
- Dynamo runtime metrics (prefixed with `dynamo_*`) are available at the same `/metrics` endpoint alongside TensorRT-LLM metrics
193193
- Implementation: `lib/runtime/src/metrics.rs` (Rust runtime metrics)
194194
- Metric names: `lib/runtime/src/metrics/prometheus_names.rs` (metric name constants)

fern/fern/pages/backends/vllm/README.mdx

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -37,14 +37,14 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
3737

3838
| Feature | vLLM | Notes |
3939
|---------|------|-------|
40-
| [**Disaggregated Serving**](../../design-docs/disagg-serving) || |
41-
| [**Conditional Disaggregation**](../../design-docs/disagg-serving#conditional-disaggregation) | 🚧 | WIP |
42-
| [**KV-Aware Routing**](../../router/kv_cache_routing) || |
43-
| [**SLA-Based Planner**](../../planner/sla_planner) || |
44-
| [**Load Based Planner**](../../planner/load_planner) | 🚧 | WIP |
45-
| [**KVBM**](../../kvbm/kvbm_architecture) || |
46-
| [**LMCache**](./LMCache_Integration) || |
47-
| [**Prompt Embeddings**](./prompt-embeddings) || Requires `--enable-prompt-embeds` flag |
40+
| [**Disaggregated Serving**](/design-docs/disaggregated-serving) || |
41+
| [**Conditional Disaggregation**](/design-docs/disaggregated-serving#conditional-disaggregation) | 🚧 | WIP |
42+
| [**KV-Aware Routing**](/additional-resources/router-details/kv-cache-routing) || |
43+
| [**SLA-Based Planner**](/components/planner/sla-based-planner) || |
44+
| [**Load Based Planner**](/additional-resources/load-planner) | 🚧 | WIP |
45+
| [**KVBM**](/components/kvbm/architecture) || |
46+
| [**LMCache**](/components/kvbm/lm-cache-integration) || |
47+
| [**Prompt Embeddings**](/additional-resources/backend-details/vllm/prompt-embeddings) || Requires `--enable-prompt-embeds` flag |
4848

4949
### Large Scale P/D and WideEP Features
5050

@@ -176,17 +176,17 @@ When using KV-aware routing, ensure deterministic hashing across processes to av
176176
```bash
177177
vllm serve ... --enable-prefix-caching --prefix-caching-algo sha256
178178
```
179-
See the high-level notes in [KV Cache Routing](../../router/kv_cache_routing) on deterministic event IDs.
179+
See the high-level notes in [KV Cache Routing](/additional-resources/router-details/kv-cache-routing) on deterministic event IDs.
180180

181181
## Request Migration
182182

183-
You can enable [request migration](../../fault-tolerance/request_migration) to handle worker failures gracefully. Use the `--migration-limit` flag to specify how many times a request can be migrated to another worker:
183+
You can enable [request migration](/additional-resources/fault-tolerance/request-migration) to handle worker failures gracefully. Use the `--migration-limit` flag to specify how many times a request can be migrated to another worker:
184184

185185
```bash
186186
python3 -m dynamo.vllm ... --migration-limit=3
187187
```
188188

189-
This allows a request to be migrated up to 3 times before failing. See the [Request Migration Architecture](../../fault-tolerance/request_migration) documentation for details on how this works.
189+
This allows a request to be migrated up to 3 times before failing. See the [Request Migration Architecture](/additional-resources/fault-tolerance/request-migration) documentation for details on how this works.
190190

191191
## Request Cancellation
192192

@@ -199,4 +199,4 @@ When a user cancels a request (e.g., by disconnecting from the frontend), the re
199199
| **Aggregated** |||
200200
| **Disaggregated** |||
201201

202-
For more details, see the [Request Cancellation Architecture](../../fault-tolerance/request_cancellation) documentation.
202+
For more details, see the [Request Cancellation Architecture](/additional-resources/fault-tolerance/request-cancellation) documentation.

fern/fern/pages/backends/vllm/deepseek-r1.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Dynamo supports running Deepseek R1 with data parallel attention and wide expert
1111

1212
## Instructions
1313

14-
The following script can be adapted to run Deepseek R1 with a variety of different configuration. The current configuration uses 2 nodes, 16 GPUs, and a dp of 16. Follow the [ReadMe](README) Getting Started section on each node, and then run these two commands.
14+
The following script can be adapted to run Deepseek R1 with a variety of different configuration. The current configuration uses 2 nodes, 16 GPUs, and a dp of 16. Follow the [vLLM Backend](/components/backends/vllm) Getting Started section on each node, and then run these two commands.
1515

1616
node 0
1717
```bash

0 commit comments

Comments
 (0)