fix some links

Jont828 · Jont828 · commit b4b921ef3b4d · 2026-01-15T16:30:31.000-05:00
diff --git a/fern/fern/pages/backends/sglang/README.mdx b/fern/fern/pages/backends/sglang/README.mdx
@@ -36,8 +36,8 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 
 | Feature | SGLang | Notes |
 |---------|--------|-------|
-| [**Disaggregated Serving**](../../design_docs/disagg_serving) | ✅ |  |
-| [**Conditional Disaggregation**](../../design_docs/disagg_serving#conditional-disaggregation) | 🚧 | WIP [PR](https://github.com/sgl-project/sglang/pull/7730) |
+| [**Disaggregated Serving**](../../design-docs/disagg-serving) | ✅ |  |
+| [**Conditional Disaggregation**](../../design-docs/disagg-serving#conditional-disaggregation) | 🚧 | WIP [PR](https://github.com/sgl-project/sglang/pull/7730) |
 | [**KV-Aware Routing**](../../router/kv_cache_routing) | ✅ |  |
 | [**SLA-Based Planner**](../../planner/sla_planner) | ✅ |  |
 | [**Multimodal Support**](../../multimodal/sglang) | ✅ |  |
@@ -57,7 +57,7 @@ Dynamo SGLang uses SGLang's native argument parser, so **most SGLang engine argu
 | Argument | Description | Default | SGLang Equivalent |
 |----------|-------------|---------|-------------------|
 | `--endpoint` | Dynamo endpoint in `dyn://namespace.component.endpoint` format | Auto-generated based on mode | N/A |
-| `--migration-limit` | Max times a request can migrate between workers for fault tolerance. See [Request Migration Architecture](../../fault_tolerance/request_migration). | `0` (disabled) | N/A |
+| `--migration-limit` | Max times a request can migrate between workers for fault tolerance. See [Request Migration Architecture](../../fault-tolerance/request_migration). | `0` (disabled) | N/A |
 | `--dyn-tool-call-parser` | Tool call parser for structured outputs (takes precedence over `--tool-call-parser`) | `None` | `--tool-call-parser` |
 | `--dyn-reasoning-parser` | Reasoning parser for CoT models (takes precedence over `--reasoning-parser`) | `None` | `--reasoning-parser` |
 | `--use-sglang-tokenizer` | Use SGLang's tokenizer instead of Dynamo's | `False` | N/A |
@@ -87,7 +87,7 @@ When a user cancels a request (e.g., by disconnecting from the frontend), the re
 ⚠️ SGLang backend currently does not support cancellation during remote prefill phase in disaggregated mode.
 </Callout>
 
-For more details, see the [Request Cancellation Architecture](../../fault_tolerance/request_cancellation) documentation.
+For more details, see the [Request Cancellation Architecture](../../fault-tolerance/request_cancellation) documentation.
 
 ## Installation
 
diff --git a/fern/fern/pages/backends/trtllm/README.mdx b/fern/fern/pages/backends/trtllm/README.mdx
@@ -41,8 +41,8 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 
 | Feature | TensorRT-LLM | Notes |
 |---------|--------------|-------|
-| [**Disaggregated Serving**](../../design_docs/disagg_serving) | ✅ |  |
-| [**Conditional Disaggregation**](../../design_docs/disagg_serving#conditional-disaggregation) | 🚧 | Not supported yet |
+| [**Disaggregated Serving**](../../design-docs/disagg-serving) | ✅ |  |
+| [**Conditional Disaggregation**](../../design-docs/disagg-serving#conditional-disaggregation) | 🚧 | Not supported yet |
 | [**KV-Aware Routing**](../../router/kv_cache_routing) | ✅ |  |
 | [**SLA-Based Planner**](../../planner/sla_planner) | ✅ |  |
 | [**Load Based Planner**](../../planner/load_planner) | 🚧 | Planned |
@@ -178,7 +178,7 @@ Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disag
 
 ## Request Migration
 
-You can enable [request migration](../../fault_tolerance/request_migration) to handle worker failures gracefully. Use the `--migration-limit` flag to specify how many times a request can be migrated to another worker:
+You can enable [request migration](../../fault-tolerance/request_migration) to handle worker failures gracefully. Use the `--migration-limit` flag to specify how many times a request can be migrated to another worker:
 
 ```bash
 # For decode and aggregated workers
@@ -189,7 +189,7 @@ python3 -m dynamo.trtllm ... --migration-limit=3
 **Prefill workers do not support request migration** and must use `--migration-limit=0` (the default). Prefill workers only process prompts and return KV cache state - they don't maintain long-running generation requests that would benefit from migration.
 </Callout>
 
-See the [Request Migration Architecture](../../fault_tolerance/request_migration) documentation for details on how this works.
+See the [Request Migration Architecture](../../fault-tolerance/request_migration) documentation for details on how this works.
 
 ## Request Cancellation
 
@@ -202,7 +202,7 @@ When a user cancels a request (e.g., by disconnecting from the frontend), the re
 | **Aggregated** | ✅ | ✅ |
 | **Disaggregated** | ✅ | ✅ |
 
-For more details, see the [Request Cancellation Architecture](../../fault_tolerance/request_cancellation) documentation.
+For more details, see the [Request Cancellation Architecture](../../fault-tolerance/request_cancellation) documentation.
 
 ## Client
 
diff --git a/fern/fern/pages/backends/vllm/README.mdx b/fern/fern/pages/backends/vllm/README.mdx
@@ -37,8 +37,8 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 
 | Feature | vLLM | Notes |
 |---------|------|-------|
-| [**Disaggregated Serving**](../../design_docs/disagg_serving) | ✅ |  |
-| [**Conditional Disaggregation**](../../design_docs/disagg_serving#conditional-disaggregation) | 🚧 | WIP |
+| [**Disaggregated Serving**](../../design-docs/disagg-serving) | ✅ |  |
+| [**Conditional Disaggregation**](../../design-docs/disagg-serving#conditional-disaggregation) | 🚧 | WIP |
 | [**KV-Aware Routing**](../../router/kv_cache_routing) | ✅ |  |
 | [**SLA-Based Planner**](../../planner/sla_planner) | ✅ |  |
 | [**Load Based Planner**](../../planner/load_planner) | 🚧 | WIP |
@@ -180,13 +180,13 @@ See the high-level notes in [KV Cache Routing](../../router/kv_cache_routing) on
 
 ## Request Migration
 
-You can enable [request migration](../../fault_tolerance/request_migration) to handle worker failures gracefully. Use the `--migration-limit` flag to specify how many times a request can be migrated to another worker:
+You can enable [request migration](../../fault-tolerance/request_migration) to handle worker failures gracefully. Use the `--migration-limit` flag to specify how many times a request can be migrated to another worker:
 
 ```bash
 python3 -m dynamo.vllm ... --migration-limit=3
 ```
 
-This allows a request to be migrated up to 3 times before failing. See the [Request Migration Architecture](../../fault_tolerance/request_migration) documentation for details on how this works.
+This allows a request to be migrated up to 3 times before failing. See the [Request Migration Architecture](../../fault-tolerance/request_migration) documentation for details on how this works.
 
 ## Request Cancellation
 
@@ -199,4 +199,4 @@ When a user cancels a request (e.g., by disconnecting from the frontend), the re
 | **Aggregated** | ✅ | ✅ |
 | **Disaggregated** | ✅ | ✅ |
 
-For more details, see the [Request Cancellation Architecture](../../fault_tolerance/request_cancellation) documentation.
+For more details, see the [Request Cancellation Architecture](../../fault-tolerance/request_cancellation) documentation.
diff --git a/fern/fern/pages/design-docs/architecture.mdx b/fern/fern/pages/design-docs/architecture.mdx
@@ -41,7 +41,7 @@ To address the growing demands of distributed inference serving, NVIDIA introduc
 
 The following diagram outlines Dynamo's high-level architecture. To enable large-scale distributed and disaggregated inference serving, Dynamo includes five key features:
 
-- [Dynamo Disaggregated Serving](disagg_serving)
+- [Dynamo Disaggregated Serving](disagg-serving)
 - [Dynamo Smart Router](../router/kv_cache_routing)
 - [Dynamo KV Cache Block Manager](../kvbm/kvbm_intro)
 - [Planner](../planner/planner_intro)
diff --git a/fern/fern/pages/development/backend-guide.mdx b/fern/fern/pages/development/backend-guide.mdx
@@ -74,7 +74,7 @@ The `model_type` can be:
 - `model_name`: The name to call the model. Your incoming HTTP requests model name must match this. Defaults to the hugging face repo name or the folder name.
 - `context_length`: Max model length in tokens. Defaults to the model's set max. Only set this if you need to reduce KV cache allocation to fit into VRAM.
 - `kv_cache_block_size`: Size of a KV block for the engine, in tokens. Defaults to 16.
-- `migration_limit`: Maximum number of times a request may be [migrated to another Instance](../fault_tolerance/request_migration). Defaults to 0.
+- `migration_limit`: Maximum number of times a request may be [migrated to another Instance](../fault-tolerance/request_migration). Defaults to 0.
 - `user_data`: Optional dictionary containing custom metadata for worker behavior (e.g., LoRA configuration). Defaults to None.
 
 See `examples/backends` for full code examples.
@@ -116,7 +116,7 @@ In the P/D disaggregated setup you would have `deepseek-distill-llama8b.prefill.
 
 A Python worker may need to be shut down promptly, for example when the node running the worker is to be reclaimed and there isn't enough time to complete all ongoing requests before the shutdown deadline.
 
-In such cases, you can signal incomplete responses by raising a `GeneratorExit` exception in your generate loop. This will immediately close the response stream, signaling to the frontend that the stream is incomplete. With request migration enabled (see the [`migration_limit`](../fault_tolerance/request_migration) parameter), the frontend will automatically migrate the partially completed request to another worker instance, if available, to be completed.
+In such cases, you can signal incomplete responses by raising a `GeneratorExit` exception in your generate loop. This will immediately close the response stream, signaling to the frontend that the stream is incomplete. With request migration enabled (see the [`migration_limit`](../fault-tolerance/request_migration) parameter), the frontend will automatically migrate the partially completed request to another worker instance, if available, to be completed.
 
 <Callout intent="warning">
 We will update the `GeneratorExit` exception to a new Dynamo exception. Please expect minor code breaking change in the near future.
@@ -140,7 +140,7 @@ class RequestHandler:
 
 When `GeneratorExit` is raised, the frontend receives the incomplete response and can seamlessly continue generation on another available worker instance, preserving the user experience even during worker shutdowns.
 
-For more information about how request migration works, see the [Request Migration Architecture](../fault_tolerance/request_migration) documentation.
+For more information about how request migration works, see the [Request Migration Architecture](../fault-tolerance/request_migration) documentation.
 
 ## Request Cancellation
 
@@ -162,4 +162,4 @@ class RequestHandler:
 
 The context parameter is optional - if your generate method doesn't include it in its signature, Dynamo will call your method without the context argument.
 
-For detailed information about request cancellation, including async cancellation monitoring and context propagation patterns, see the [Request Cancellation Architecture](../fault_tolerance/request_cancellation) documentation.
+For detailed information about request cancellation, including async cancellation monitoring and context propagation patterns, see the [Request Cancellation Architecture](../fault-tolerance/request_cancellation) documentation.
diff --git a/fern/fern/pages/getting-started/examples.mdx b/fern/fern/pages/getting-started/examples.mdx
@@ -51,9 +51,3 @@ cd examples/backends/sglang
 
 # Follow the README in each example directory
 ```
-
-## Next Steps
-
-- See the [Backends documentation](./backends/vllm/README) for detailed backend configuration
-- Check [Kubernetes Deployment](./kubernetes/README) for production deployments
-- Review [User Guides](./agents/tool-calling) for advanced features
diff --git a/fern/fern/pages/getting-started/installation.mdx b/fern/fern/pages/getting-started/installation.mdx
@@ -46,9 +46,3 @@ docker run --rm -it \
   --network host \
   nvcr.io/nvidia/ai-dynamo/sglang-runtime:latest  # or vllm, tensorrtllm
 ```
-
-## Next Steps
-
-- Check the [Support Matrix](./reference/support-matrix) for compatible versions
-- Try the [Examples](./examples) to see Dynamo in action
-- Deploy on [Kubernetes](./kubernetes/README) for production workloads
diff --git a/fern/fern/pages/getting-started/intro.mdx b/fern/fern/pages/getting-started/intro.mdx
@@ -70,31 +70,3 @@ curl localhost:8000/v1/chat/completions \
 | **KV Cache Routing** | Intelligent request routing based on KV cache state |
 | **Kubernetes Native** | Full operator and Helm chart support |
 | **Observability** | Prometheus metrics, Grafana dashboards, and tracing |
-
-## Documentation Overview
-
-### Backends
-- [vLLM Backend](./backends/vllm/README) - High-throughput serving with vLLM
-- [SGLang Backend](./backends/sglang/README) - Fast inference with SGLang
-- [TensorRT-LLM Backend](./backends/trtllm/README) - Optimized inference with TensorRT-LLM
-
-### Kubernetes Deployment
-- [Installation Guide](./kubernetes/installation_guide) - Deploy Dynamo on Kubernetes
-- [Operator Guide](./kubernetes/dynamo_operator) - Using the Dynamo Operator
-- [Autoscaling](./kubernetes/autoscaling) - Automatic scaling configuration
-
-### Architecture
-- [System Architecture](./design-docs/architecture) - Overall system design
-- [Disaggregated Serving](./design-docs/disagg-serving) - P/D separation architecture
-- [Distributed Runtime](./design-docs/distributed_runtime) - Runtime internals
-
-### Performance & Tuning
-- [Performance Tuning](./performance/tuning) - Optimize your deployment
-- [Benchmarking](./benchmarks/benchmarking) - Measure and compare performance
-- [AI Configurator](./performance/aiconfigurator) - Automated configuration
-
-## Getting Help
-
-- **GitHub Issues**: [Report bugs or request features](https://github.com/ai-dynamo/dynamo/issues)
-- **Discussions**: [Ask questions and share ideas](https://github.com/ai-dynamo/dynamo/discussions)
-- **Reference**: [CLI Reference](./reference/cli) | [Glossary](./reference/glossary) | [Support Matrix](./reference/support-matrix)
diff --git a/fern/fern/pages/getting-started/quickstart.mdx b/fern/fern/pages/getting-started/quickstart.mdx
@@ -70,31 +70,3 @@ curl localhost:8000/v1/chat/completions \
 | **KV Cache Routing** | Intelligent request routing based on KV cache state |
 | **Kubernetes Native** | Full operator and Helm chart support |
 | **Observability** | Prometheus metrics, Grafana dashboards, and tracing |
-
-## Documentation Overview
-
-### Backends
-- [vLLM Backend](./backends/vllm/README) - High-throughput serving with vLLM
-- [SGLang Backend](./backends/sglang/README) - Fast inference with SGLang
-- [TensorRT-LLM Backend](./backends/trtllm/README) - Optimized inference with TensorRT-LLM
-
-### Kubernetes Deployment
-- [Installation Guide](./kubernetes/installation_guide) - Deploy Dynamo on Kubernetes
-- [Operator Guide](./kubernetes/dynamo_operator) - Using the Dynamo Operator
-- [Autoscaling](./kubernetes/autoscaling) - Automatic scaling configuration
-
-### Architecture
-- [System Architecture](./design-docs/architecture) - Overall system design
-- [Disaggregated Serving](./design-docs/disagg-serving) - P/D separation architecture
-- [Distributed Runtime](./design-docs/distributed_runtime) - Runtime internals
-
-### Performance & Tuning
-- [Performance Tuning](./performance/tuning) - Optimize your deployment
-- [Benchmarking](./benchmarks/benchmarking) - Measure and compare performance
-- [AI Configurator](./performance/aiconfigurator) - Automated configuration
-
-## Getting Help
-
-- **GitHub Issues**: [Report bugs or request features](https://github.com/ai-dynamo/dynamo/issues)
-- **Discussions**: [Ask questions and share ideas](https://github.com/ai-dynamo/dynamo/discussions)
-- **Reference**: [CLI Reference](./reference/cli) | [Glossary](./reference/glossary) | [Support Matrix](./reference/support-matrix)
diff --git a/fern/fern/pages/observability/health-checks.mdx b/fern/fern/pages/observability/health-checks.mdx
@@ -341,6 +341,6 @@ ERROR Health check request failed for generate: connection refused
 
 ## Related Documentation
 
-- [Distributed Runtime Architecture](../design_docs/distributed_runtime)
-- [Dynamo Architecture Overview](../design_docs/architecture)
+- [Distributed Runtime Architecture](../design-docs/distributed_runtime)
+- [Dynamo Architecture Overview](../design-docs/architecture)
 - [Backend Guide](../development/backend-guide)
diff --git a/fern/fern/pages/observability/logging.mdx b/fern/fern/pages/observability/logging.mdx
@@ -257,7 +257,7 @@ Notice how the `x_request_id` field appears in all log entries, alongside the `t
 
 ## Related Documentation
 
-- [Distributed Runtime Architecture](../design_docs/distributed_runtime)
-- [Dynamo Architecture Overview](../design_docs/architecture)
+- [Distributed Runtime Architecture](../design-docs/distributed_runtime)
+- [Dynamo Architecture Overview](../design-docs/architecture)
 - [Backend Guide](../development/backend-guide)
 - [Log Aggregation in Kubernetes](../kubernetes/observability/logging)
diff --git a/fern/fern/pages/observability/metrics-developer-guide.mdx b/fern/fern/pages/observability/metrics-developer-guide.mdx
@@ -266,6 +266,6 @@ DYN_SYSTEM_PORT=8081 ./server_with_callback.py
 
 - [Metrics Overview](metrics)
 - [Prometheus and Grafana Setup](prometheus-grafana)
-- [Distributed Runtime Architecture](../design_docs/distributed_runtime)
+- [Distributed Runtime Architecture](../design-docs/distributed_runtime)
 - [Python Metrics Examples](https://github.com/ai-dynamo/dynamo/tree/main/lib/bindings/python/examples/metrics/)
 
diff --git a/fern/fern/pages/observability/metrics.mdx b/fern/fern/pages/observability/metrics.mdx
@@ -233,6 +233,6 @@ Suppose the backend allows 3 concurrent requests and there are 10 clients contin
 
 ## Related Documentation
 
-- [Distributed Runtime Architecture](../design_docs/distributed_runtime)
-- [Dynamo Architecture Overview](../design_docs/architecture)
+- [Distributed Runtime Architecture](../design-docs/distributed_runtime)
+- [Dynamo Architecture Overview](../design-docs/architecture)
 - [Backend Guide](../development/backend-guide)
diff --git a/fern/fern/pages/reference/cli.mdx b/fern/fern/pages/reference/cli.mdx
@@ -159,19 +159,19 @@ The KV-aware routing arguments:
 
 ### Request Migration
 
-In a [Distributed System](#distributed-system), you can enable [request migration](../fault_tolerance/request_migration) to handle worker failures gracefully. Use the `--migration-limit` flag to specify how many times a request can be migrated to another worker:
+In a [Distributed System](#distributed-system), you can enable [request migration](../fault-tolerance/request_migration) to handle worker failures gracefully. Use the `--migration-limit` flag to specify how many times a request can be migrated to another worker:
 
 ```bash
 dynamo-run in=dyn://... out=<engine> ... --migration-limit=3
 ```
 
-This allows a request to be migrated up to 3 times before failing. See the [Request Migration Architecture](../fault_tolerance/request_migration) documentation for details on how this works.
+This allows a request to be migrated up to 3 times before failing. See the [Request Migration Architecture](../fault-tolerance/request_migration) documentation for details on how this works.
 
 ### Request Cancellation
 
 When using the HTTP interface (`in=http`), if the HTTP request connection is dropped by the client, Dynamo automatically cancels the downstream request to the worker. This ensures that computational resources are not wasted on generating responses that are no longer needed.
 
-For detailed information about how request cancellation works across the system, see the [Request Cancellation Architecture](../fault_tolerance/request_cancellation) documentation.
+For detailed information about how request cancellation works across the system, see the [Request Cancellation Architecture](../fault-tolerance/request_cancellation) documentation.
 
 ## Development