docs: migrate existing docs to fern#5445
Conversation
|
👋 Hi Jont828! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
d1054e5 to
120fba5
Compare
3fadd81 to
a3d14ef
Compare
|
/ok to test a3d14ef |
WalkthroughAdds Fern site configuration and many new documentation files (MDX) across docs for backends, APIs (nixl_connect), Kubernetes, observability, benchmarking, architecture, multimodal, KVBM, planners, and developer guides; also updates .gitignore and a GitHub filter entry. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🤖 Fix all issues with AI agents
In `@fern/fern/fern.config.json`:
- Around line 1-4: Update the Fern CLI version in fern.config.json by replacing
the non-existent "version": "3.42.1" value with a valid released version (e.g.,
"3.29.1"); modify the "version" field in the JSON so the project uses a
published Fern CLI release to avoid installation/build failures.
In `@fern/fern/pages/agents/tool-calling.mdx`:
- Around line 55-72: The client example's base_url port mismatches the frontend
launch: update the OpenAI client base_url value (the base_url argument in the
example) to use port 8000 (http://localhost:8000/v1) to match the default
frontend started by python -m dynamo.frontend, or alternatively modify the
frontend launch command (python -m dynamo.frontend) to explicitly set
--http-port 8081 so it matches the current base_url; ensure you update either
the base_url in the example or add the --http-port flag to the python -m
dynamo.frontend command so both use the same port.
In `@fern/fern/pages/backends/sglang/sgl-hicache-example.mdx`:
- Around line 14-36: Update the example so the SGLang worker and frontend use
different ports: change the worker invocation flag "--port 8000" in the `python
-m dynamo.sglang` example to an unused port (e.g. "--port 8001") while leaving
`python -m dynamo.frontend --http-port 8000` unchanged; ensure both command
examples in the file reference the new worker port to avoid the port binding
conflict.
In `@fern/fern/pages/backends/vllm/multi-node.mdx`:
- Around line 84-88: The multi-line shell command is missing a trailing
backslash on the model line ("--model meta-llama/Llama-3.3-70B-Instruct"),
causing a shell syntax error; fix it by adding a trailing backslash to that line
so the command continuation lines ("--tensor-parallel-size 8 \" and
"--enforce-eager") are correctly joined into one multi-line command.
- Around line 93-97: The shell command snippet is missing a trailing backslash
on the line with the --tensor-parallel-size flag causing a syntax error; fix it
by adding a backslash at the end of the line containing "--tensor-parallel-size
8" so the command lines properly continue (keep the existing backslashes on
other lines like "--enforce-eager \" unchanged).
- Around line 78-98: The disaggregated example has swapped comments and flags
for Node 1 and Node 2: update the Node 1 block (the "Node 1" header and the
python -m dynamo.vllm invocation) to label it "Run ingress and decode worker"
and change the inline comment to "Start decode worker" (keeping the dynamo.vllm
command without --is-prefill-worker), and update the Node 2 block (the second
python -m dynamo.vllm invocation) to label it "Run prefill worker" and change
its inline comment to "Start prefill worker" while retaining the
--is-prefill-worker flag; ensure the python -m dynamo.frontend line remains as
the ingress start and that the flag --is-prefill-worker appears only in the
prefill worker command.
In `@fern/fern/pages/design-docs/distributed_runtime.mdx`:
- Around line 34-37: The admonition block beginning with ":::caution" is closed
incorrectly with triple backticks; locate the admonition start (:::caution) and
replace the closing backticks with the matching closing marker ":::", ensuring
the block is opened with ":::caution" and closed with ":::".
In `@fern/fern/pages/frontends/kserve.mdx`:
- Line 97: The doc contains GitHub links pointing to tree/main which will break
versioned docs; update the two links referencing lib/llm/src/protocols/tensor.rs
and the two links referencing lib/bindings/python/tests/test_tensor.py (as seen
around the TensorModelConfig paragraph in kserve.mdx) to use versioned
references (a specific tag like vX.Y.Z, a commit SHA, or relative repository
paths) instead of tree/main so they resolve to the correct code for each
published doc version; ensure all four occurrences on and around the
TensorModelConfig paragraph are replaced consistently.
In `@fern/fern/pages/getting-started/intro.mdx`:
- Around line 77-94: The three documentation links under "Architecture" are
pointing to the wrong directory and one has the wrong filename: update the
System Architecture link (`./design_docs/architecture`) to
`./design-docs/architecture`, update the Disaggregated Serving link
(`./design_docs/disagg_serving`) to `./design-docs/disagg-serving`, and update
the Distributed Runtime link (`./design_docs/distributed_runtime`) to
`./design-docs/distributed_runtime` so the directory uses the hyphenated name
`design-docs` and the disaggregated file uses the hyphenated filename
`disagg-serving`.
In `@fern/fern/pages/getting-started/quickstart.mdx`:
- Around line 76-100: The three Markdown links in the Architecture section use
the wrong directory and file name separators; update the link targets to the
correct paths: change `./design_docs/architecture` to
`./design-docs/architecture`, change `./design_docs/disagg_serving` to
`./design-docs/disagg-serving`, and change `./design_docs/distributed_runtime`
to `./design-docs/distributed_runtime` so they point to the existing
`design-docs` directory and the hyphenated `disagg-serving` file.
🟡 Minor comments (64)
fern/fern/pages/kubernetes/deployment/minikube.mdx-26-34 (1)
26-34: Provide separate commands for GPU and non-GPU setups.The command includes
--gpus allunconditionally, but the comment says "if configured". Users without GPUs will encounter an error when running this command. Consider providing two separate commands to avoid confusion.Suggested fix
```bash -# Start Minikube with GPU support (if configured) -minikube start --driver docker --container-runtime docker --gpus all --memory=16000mb --cpus=8 +# Start Minikube without GPU support +minikube start --driver docker --container-runtime docker --memory=16000mb --cpus=8 + +# Or, start Minikube with GPU support (if configured in step 2) +# minikube start --driver docker --container-runtime docker --gpus all --memory=16000mb --cpus=8 # Enable required addons minikube addons enable istio-provisionerfern/fern/pages/guides/request_plane.mdx-163-167 (1)
163-167: Fix grammar issues in NATS usage section.Two minor issues on line 165:
- "KV based routing" should be hyphenated as "KV-based routing"
- Subject-verb agreement: "routing require" should be "routing requires"
📝 Suggested fix
**When to use NATS:** - Production deployments with service discovery -- Currently KV based routing require NATS. If you want to completely disable NATS, KV based routing won't be available +- Currently KV-based routing requires NATS. If you want to completely disable NATS, KV-based routing won't be available - Need for message replay and persistence featuresfern/fern/pages/guides/jail_stream_readme.mdx-26-27 (1)
26-27: Correct the example file path —jail_example.rsdoes not exist in the codebase.The documentation references
lib/llm/src/protocols/openai/chat_completions/jail_example.rsfor examples, but this file does not exist. The main implementation file atlib/llm/src/protocols/openai/chat_completions/jail.rsexists and is correct. Update the examples path to point to the actual location where examples or usage are documented (possiblylib/llm/tests/test_jail.rsor another file).fern/fern/pages/frontends/kserve.mdx-12-12 (1)
12-12: Fix compound adjective hyphenation in multiple locations.Several compound adjectives should be hyphenated per standard English grammar: "industry-standard", "tensor-based", "KServe-based", and "client-side".
📝 Proposed fixes for hyphenation
Line 12:
-[KServe v2 API](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2) is one of the industry standard protocol for machine learning model inference. +[KServe v2 API](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2) is one of the industry-standard protocols for machine learning model inference.Line 35:
-* `ModelType::TensorBased` and `ModelInput::Tensor`: Combination for backend that is used for generic tensor based inference +* `ModelType::TensorBased` and `ModelInput::Tensor`: Combination for backend that is used for generic tensor-based inferenceLine 41:
-Most of the Dynamo features are tailored for LLM inference and the combinations that are backed by OpenAI API can enable those features and are best suited for exploring those Dynamo features. However, this implies specific conversion between generic tensor based messages and OpenAI message and imposes specific structure of the KServe request message. +Most of the Dynamo features are tailored for LLM inference and the combinations that are backed by OpenAI API can enable those features and are best suited for exploring those Dynamo features. However, this implies specific conversion between generic tensor-based messages and OpenAI message and imposes specific structure of the KServe request message.Line 92:
-This combination is used when the user is migrating an existing KServe based backend into Dynamo ecosystem. +This combination is used when the user is migrating an existing KServe-based backend into Dynamo ecosystem.Line 96:
-When registering the backend, the backend must provide the model's metadata as tensor based deployment is generic and the frontend can't make any assumptions like for OpenAI Completions model. +When registering the backend, the backend must provide the model's metadata as tensor-based deployment is generic and the frontend can't make any assumptions like for OpenAI Completions model.Line 98:
-* [triton_model_config](https://github.com/ai-dynamo/dynamo/tree/main/lib/llm/src/protocols/tensor.rs): For users that already have Triton model config and require the full config to be returned for client side logic, they can set the config in `TensorModelConfig::triton_model_config` which will supersedes other fields in `TensorModelConfig` and be used for endpoint responses. +* [triton_model_config](https://github.com/ai-dynamo/dynamo/tree/main/lib/llm/src/protocols/tensor.rs): For users that already have Triton model config and require the full config to be returned for client-side logic, they can set the config in `TensorModelConfig::triton_model_config` which will supersedes other fields in `TensorModelConfig` and be used for endpoint responses.Also applies to: 35-35, 41-41, 92-92, 96-96, 98-98
fern/fern/pages/backends/vllm/deepseek-r1.mdx-10-10 (1)
10-10: Typo: "seperate" → "separate".📝 Proposed fix
-Dynamo supports running Deepseek R1 with data parallel attention and wide expert parallelism. Each data parallel attention rank is a seperate dynamo component that will emit its own KV Events and Metrics. vLLM controls the expert parallelism using the flag `--enable-expert-parallel` +Dynamo supports running Deepseek R1 with data parallel attention and wide expert parallelism. Each data parallel attention rank is a separate dynamo component that will emit its own KV Events and Metrics. vLLM controls the expert parallelism using the flag `--enable-expert-parallel`fern/fern/pages/kubernetes/dynamo_operator.mdx-84-107 (1)
84-107: Unclosed code block causes rendering issues.The bash code block starting at line 84 is missing a closing fence before the "Observability" section at line 97. This will cause the observability heading and subsequent content to render incorrectly (likely as part of the code block or with broken formatting).
📝 Proposed fix
--set dynamo-operator.controllerManager.manager.image.tag=v2.0.0-beta +``` **Observability:**fern/fern/pages/getting-started/support-matrix.mdx-72-74 (1)
72-74: Outdated release date needs updating.The callout states v0.8.0 is "planned for January 14, 2025", but that date has passed. Update to reflect current status (either released or the actual planned date).
Suggested fix
<Callout intent="info"> -**main (ToT)** reflects the current development branch. **v0.8.0** is the upcoming release (planned for January 14, 2025) and not yet available. +**main (ToT)** reflects the current development branch. **v0.8.0** is the upcoming release and not yet available. </Callout>fern/fern/pages/getting-started/support-matrix.mdx-14-17 (1)
14-17: Clarify ARM64 wheel availability.The table indicates ARM64 CPU architecture is "Supported", but based on learnings, the project does not ship ARM64 wheels. Consider clarifying that ARM64 is supported via Docker images only, not pip wheels, to avoid confusion.
fern/fern/pages/getting-started/examples.mdx-55-59 (1)
55-59: Internal link paths require correction.The relative paths in the "Next Steps" section are incorrect. The directories
backends,kubernetes, andagentsare siblings ofgetting-startedat thepagesroot level, not child directories. The current paths./backends/vllm/README,./kubernetes/README, and./agents/tool-callingwill fail to resolve. Use../to navigate up to the pages level first.Suggested fix
## Next Steps -- See the [Backends documentation](./backends/vllm/README) for detailed backend configuration -- Check [Kubernetes Deployment](./kubernetes/README) for production deployments -- Review [User Guides](./agents/tool-calling) for advanced features +- See the [Backends documentation](../backends/vllm/README) for detailed backend configuration +- Check [Kubernetes Deployment](../kubernetes/README) for production deployments +- Review [User Guides](../agents/tool-calling) for advanced featuresfern/fern/pages-v0.6.0/coming-soon.mdx-7-11 (1)
7-11: Terminology inconsistency: "Latest" vs "Next".The page refers users to the "Latest version", but in
docs.ymlthe current/development version is labeled "Next" (display-name). Consider aligning the terminology to avoid confusion.📝 Suggested fix
<Callout intent="info"> Documentation for this version is coming soon. </Callout> -This version's documentation is being migrated. Please check back later or use the **Latest** version for the most up-to-date documentation. +This version's documentation is being migrated. Please check back later or use the **Next** version for the most up-to-date documentation.Alternatively, if "Latest" is the intended user-facing term, update the display-name in
docs.yml.fern/fern/pages/kvbm/kvbm_motivation.mdx-12-17 (1)
12-17: Grammar and clarity issue in bullet point.Line 15 has awkward phrasing: "Modular and need simplified UX and to be memory safe" doesn't read clearly. Consider revising for clarity.
📝 Suggested fix
* Tailored for GenAI use-cases * Lack of visibility into real-time block usage patterns. * Need for lightweight, ownership-driven memory management over complex object stores with unneeded overheads. -* Modular and need simplified UX and to be memory safe. +* Need for modular, memory-safe design with simplified UX. * Inability to differentiate between hot (frequently accessed) and cold (infrequently accessed) blocks across the stack without intrusive application-level changes. * Difficulty in optimizing storage placement across heterogeneous storage tiers (for example, SSDs, object storage, and cloud storage).fern/fern/pages/kubernetes/fluxcd.mdx-28-28 (1)
28-28: Fix grammatical issue in sentence.The sentence has awkward phrasing: "First, follow to [See Install..." should likely be "First, see Install Dynamo Kubernetes Platform." or similar.
Suggested fix
-First, follow to [See Install Dynamo Kubernetes Platform](./installation_guide). +First, see [Install Dynamo Kubernetes Platform](./installation_guide).fern/fern/pages/kubernetes/fluxcd.mdx-69-69 (1)
69-69: Terminology: "CRD" should be "CR".A CRD (Custom Resource Definition) defines the schema; a CR (Custom Resource) is an instance of that schema. When updating a deployment, you update the CR (DynamoGraphDeployment instance), not the CRD.
Suggested fix
-To update your pipeline, just update the associated DynamoGraphDeployment CRD +To update your pipeline, just update the associated DynamoGraphDeployment CRfern/fern/pages/kubernetes/installation_guide.mdx-333-338 (1)
333-338: Capitalize sentence beginning.The sentence starting with "just add" should begin with a capital letter for proper grammar.
Proposed fix
-just add the following to the helm install command: +Just add the following to the helm install command:fern/fern/pages/kubernetes/deployment/create_deployment.mdx-157-230 (1)
157-230: Step numbering is inconsistent - jumps from Step 3 to Step 6.The document has Steps 1, 2, and 3, but then jumps directly to Step 6 at line 230. Steps 4 and 5 are missing, which will confuse readers following the guide sequentially.
Proposed fix
-## Step 6: Deploy LoRA Adapters (Optional) +## Step 4: Deploy LoRA Adapters (Optional)Alternatively, add the missing Steps 4 and 5 if there was intended content for them.
fern/fern/pages/backends/trtllm/gpt-oss.mdx-216-216 (1)
216-216: Typo: "ususally" should be "usually".Suggested fix
-is that the application has a set of tools to aid the assistant provide accurate answer, and it is ususally +is that the application has a set of tools to aid the assistant provide accurate answer, and it is usuallyfern/fern/pages/multimodal/vllm.mdx-166-166 (1)
166-166: GitHub-style alert syntax may not render in fern MDX.The
> [!NOTE]syntax is GitHub Flavored Markdown and may not render correctly in fern's MDX environment. Consider using fern's Callout component for consistency with other callouts in this file (like lines 12-16).Suggested fix
-> [!NOTE] Disaggregation is currently only confirmed to work with LLaVA. Qwen2.5-VL is not confirmed to be supported. +<Callout intent="info"> +Disaggregation is currently only confirmed to work with LLaVA. Qwen2.5-VL is not confirmed to be supported. +</Callout>fern/fern/pages/backends/trtllm/gpt-oss.mdx-163-174 (1)
163-174: Decode worker command missing--max-batch-sizeparameter.Line 122 documents that decode-specific arguments include
--max-batch-size 128, but the manual launch command for the decode worker (lines 163-174) omits this parameter while the prefill worker includes its--max-batch-size 32.Suggested fix
CUDA_VISIBLE_DEVICES=4,5,6,7 python3 -m dynamo.trtllm \ --model-path /model \ --served-model-name openai/gpt-oss-120b \ --extra-engine-args examples/backends/trtllm/engine_configs/gpt-oss-120b/decode.yaml \ --dyn-reasoning-parser gpt_oss \ --dyn-tool-call-parser harmony \ --disaggregation-mode decode \ --max-num-tokens 16384 \ + --max-batch-size 128 \ --free-gpu-memory-fraction 0.9 \ --tensor-parallel-size 4 \ --expert-parallel-size 4fern/fern/pages/kubernetes/quickstart.mdx-176-198 (1)
176-198: InconsistentdynamoNamespacevalues in example YAML.The example shows
dynamoNamespace: my-llmforFrontend(line 178) anddynamoNamespace: dynamo-devforVllmDecodeWorker(line 185). While the doc mentions these namespaces are independent (line 23), having different values in the same deployment example may confuse users following this as a template. Consider using consistent values in this introductory example.Suggested fix
Frontend: - dynamoNamespace: my-llm + dynamoNamespace: dynamo-dev componentType: frontendfern/fern/pages/multimodal/index.mdx-18-21 (1)
18-21: Empty "Backend Documentation" section.The section header exists but contains no content (only blank lines before the Support Matrix). Either add the intended content or remove this section header.
Suggested fix (if removing)
-## Backend Documentation - - - ## Support Matrixfern/fern/pages/backends/trtllm/gpt-oss.mdx-176-176 (1)
176-176: Section numbering skips from 4 to 6.The guide jumps from "4. Launch the Deployment" directly to "6. Verify the Deployment is Ready", skipping section 5.
Suggested fix
-### 6. Verify the Deployment is Ready +### 5. Verify the Deployment is ReadyAnd update subsequent sections accordingly (6→5, 7→6, 8→7).
fern/fern/pages/fault-tolerance/request_migration.mdx-47-47 (1)
47-47: Grammar error: "This creates accumulates" is incorrect.The sentence appears to have a word missing or incorrect construction.
Suggested fix
-2. **Response Tracking**: As each response arrives from the worker, the migration system extracts the newly generated tokens and appends them to the request's token sequence. This creates accumulates all tokens that have been generated. +2. **Response Tracking**: As each response arrives from the worker, the migration system extracts the newly generated tokens and appends them to the request's token sequence. This accumulates all tokens that have been generated.fern/fern/pages/development/backend-guide.mdx-104-104 (1)
104-104: Typo: "generat" should be "generate".There's a typo in the example that should be corrected to avoid confusion.
Suggested fix
-Node 2: namespace: llama3-1-8b, component: backend, endpoint: generat, model: /data/Llama-3.1-8B-Instruct/ +Node 2: namespace: llama3-1-8b, component: backend, endpoint: generate, model: /data/Llama-3.1-8B-Instruct/fern/fern/pages/kubernetes/api_reference.mdx-17-23 (1)
17-23: Remove duplicate package description in auto-generated documentation.Lines 17 and 22 contain identical package descriptions. The duplicate originates from two Go source files both providing the same package-level documentation:
deploy/operator/api/v1alpha1/groupversion_info.godeploy/operator/api/v1alpha1/dynamographdeploymentrequest_types.goRemove the generic package description from
dynamographdeploymentrequest_types.goand retain only the DynamoGraphDeploymentRequest-specific context, keeping the description unique to that file's purpose.fern/fern/pages/multimodal/trtllm.mdx-259-259 (1)
259-259: Potential broken link after migration.The comment references
docs/backends/trtllm/README.md#build-container, but since this PR migrates documentation tofern/, this path may become invalid after the Sphinx docs are removed. Consider updating to reference the Fern documentation path or a stable external URL.fern/fern/pages/agents/tool-calling.mdx-50-50 (1)
50-50: Trailing comma in Jamba models list.The Jamba parser row ends with a trailing comma after
AI21-Jamba-*-1.7,which appears unintentional.✏️ Suggested fix
-| jamba | ai21labs/AI21-Jamba-*-1.5, ai21labs/AI21-Jamba-*-1.6, ai21labs/AI21-Jamba-*-1.7, | +| jamba | ai21labs/AI21-Jamba-*-1.5, ai21labs/AI21-Jamba-*-1.6, ai21labs/AI21-Jamba-*-1.7 |fern/fern/pages/benchmarks/kv-router-ab-testing.mdx-104-104 (1)
104-104: External link may become stale after migration.The link to
docs/kubernetes/installation_guide.mdin the main branch may break if the Sphinx docs are removed during migration. Consider updating to reference the new Fern documentation path once the migration is complete.fern/fern/pages/backends/vllm/README.mdx-59-59 (1)
59-59: Typographical error: Extra word "our".The phrase "all of our the common" contains a typo.
📝 Suggested fix
-Below we provide a guide that lets you run all of our the common deployment patterns on a single node. +Below we provide a guide that lets you run all the common deployment patterns on a single node.fern/fern/pages/backends/vllm/README.mdx-167-167 (1)
167-167: Update vLLM documentation link to use/latest/for automatic version tracking.The linked documentation for vLLM v0.9.2 is significantly outdated. The current latest version is v0.13.0 (released December 2025). Consider updating the URL to use the
/en/latest/path instead to ensure the documentation reference stays current as vLLM continues to release updates frequently.fern/fern/pages/kvbm/kvbm_components.mdx-39-42 (1)
39-42: Fix grammatical error in data flow description.Line 40 has awkward phrasing that makes the sentence incomplete.
📝 Suggested fix
**Device → Host (Offload)** -* Triggered explicitly requested to offload by the connector scheduler. +* Triggered when explicitly requested to offload by the connector scheduler. * Worker allocates a Host block and performs CUDA D2H/Custom Kernel copy.fern/fern/pages/backends/sglang/gpt-oss.mdx-10-11 (1)
10-11: Fix typo: "ues" → "use".📝 Suggested fix
The gpt-oss-120b guide for SGLang is largely identical to the [guide for vLLM](/additional-resources/backend-details/v-llm/gpt-oss), -please ues the vLLM guide as a reference with the different deployment steps as highlighted below: +please use the vLLM guide as a reference with the different deployment steps as highlighted below:fern/fern/pages/backends/trtllm/llama4_plus_eagle.mdx-25-28 (1)
25-28: Incomplete sentence in setup instructions.Line 27 ends with "based:" which appears incomplete. Consider completing the sentence for clarity.
Suggested fix
Assuming you have already allocated your nodes via `salloc`, and are inside an interactive shell on one of the allocated nodes, set the -following environment variables based: +following environment variables based on your setup:fern/fern/pages/api/nixl_connect/rdma_metadata.mdx-21-26 (1)
21-26: Incorrect link for WritableOperation.Line 24 links
WritableOperationtowrite_operationinstead ofwritable_operation. The pairing documentation should link each class to its own documentation page.Proposed fix
<Callout intent="success"> Classes using `RdmaMetadata` objects must be paired correctly. [`ReadableOperation`](readable_operation) with [`ReadOperation`](read_operation), and -[`WritableOperation`](write_operation) with [`WriteOperation`](write_operation). +[`WritableOperation`](writable_operation) with [`WriteOperation`](write_operation). Incorrect pairing will result in an error being raised. </Callout>fern/fern/pages/benchmarks/benchmarking.mdx-485-489 (1)
485-489: Fix numbered list - missing item 2.The troubleshooting list skips from item 1 to item 3. Either add the missing item or renumber the list sequentially.
Proposed fix
1. **Service not found**: Ensure your DynamoGraphDeployment frontend service is running -3. **PVC access**: Check that `dynamo-pvc` is properly configured and accessible -4. **Image pull issues**: Ensure the Docker image is accessible from the cluster -5. **Resource constraints**: Adjust resource limits if the job is being evicted +2. **PVC access**: Check that `dynamo-pvc` is properly configured and accessible +3. **Image pull issues**: Ensure the Docker image is accessible from the cluster +4. **Resource constraints**: Adjust resource limits if the job is being evictedfern/fern/pages/observability/README.mdx-37-37 (1)
37-37: Minor grammatical fix needed."Documentations" is non-standard; use "Documentation" instead.
Suggested fix
-## Observability Documentations +## Observability Documentationfern/fern/pages/observability/health-checks.mdx-60-62 (1)
60-62: Port inconsistency in example.The quickstart section (line 34) states the frontend default port is 8000, but this example uses port 8080. This inconsistency could confuse users.
Suggested fix
-curl -s localhost:8080/live -q | jq +curl -s localhost:8000/live -q | jqfern/fern/pages/observability/health-checks.mdx-79-85 (1)
79-85: Copy-paste error and port inconsistency.
- Line 79: The note incorrectly says "Frontend liveness" but this section is about "Frontend Health Check"
- Line 84: Uses port 8080, but should be 8000 to match the documented default
Suggested fix
-> **Note**: Frontend liveness doesn't depend on worker health or liveness only on the Frontend service itself. +> **Note**: Frontend health doesn't depend on worker health or liveness only on the Frontend service itself. ### Example Request-curl -v localhost:8080/health -q | jq
+curl -v localhost:8000/health -q | jqfern/fern/pages/observability/README.mdx-56-56 (1)
56-56: Malformed link text.The link text
do../kubernetes/observability/metrics.mdappears to be a typo or incomplete path. This should be corrected to display meaningful text.Suggested fix
-For Kubernetes-specific setup and configuration, see [do../kubernetes/observability/metrics.md](../kubernetes/observability/metrics). +For Kubernetes-specific setup and configuration, see [Kubernetes Observability Metrics](../kubernetes/observability/metrics).fern/fern/pages/api/nixl_connect/descriptor.mdx-10-11 (1)
10-11: Typo: "NIXL-base" should be "NIXL-based".Line 10 has a typo that should be corrected for consistency with other documentation.
Proposed fix
-Memory descriptor that ensures memory is registered with the NIXL-base I/O subsystem. +Memory descriptor that ensures memory is registered with the NIXL-based I/O subsystem.fern/fern/pages/api/nixl_connect/descriptor.mdx-37-39 (1)
37-39: Minor grammar fix: missing pronoun "it".Proposed fix
-When the descriptor is assigned to a NIXL operation, it will be automatically registered if was not explicitly registered. +When the descriptor is assigned to a NIXL operation, it will be automatically registered if it was not explicitly registered.fern/fern/pages/api/nixl_connect/descriptor.mdx-21-21 (1)
21-21: Minor grammar fix: use hyphen in compound adjective."CPU addressable" should be hyphenated when used as a compound adjective before a noun.
Proposed fix
- 3. From a Python `bytes` object. Memory is assumed to reside in CPU addressable host memory. + 3. From a Python `bytes` object. Memory is assumed to reside in CPU-addressable host memory.fern/fern/pages/benchmarks/sla_driven_profiling.mdx-256-256 (1)
256-256: Typo: "interplation" should be "interpolation".Proposed fix
-- `selected_decode_interpolation/decode_itl_interplation.png`: ITL vs KV usage and context length for the recommended decode engine +- `selected_decode_interpolation/decode_itl_interpolation.png`: ITL vs KV usage and context length for the recommended decode enginefern/fern/pages/backends/vllm/gpt-oss.mdx-115-116 (1)
115-116: Typo: "ususally" should be "usually".Proposed fix
-is that the application has a set of tools to aid the assistant provide accurate answer, and it is ususally -multi-turn as it involves tool selection and generation based on the tool result. Below is an example +is that the application has a set of tools to aid the assistant provide accurate answers, and it is usually +multi-turn as it involves tool selection and generation based on the tool result. Below is an examplefern/fern/pages/backends/vllm/speculative_decoding.mdx-88-105 (1)
88-105: Example output format doesn't match chat completions API response.The curl request targets
/v1/chat/completions, but the example response uses the completions format with a"text"field. The chat completions endpoint returns a"message"object instead:{ "choices": [ { "message": { "role": "assistant", "content": "..." }, "index": 0, "finish_reason": "stop" } ] }This may confuse users trying to parse the response programmatically.
📝 Suggested fix
{ "id": "cmpl-3e87ea5c-010e-4dd2-bcc4-3298ebd845a8", "choices": [ { - "text": "In cherry blossom's gentle breeze ... A delicate balance of life and death, as petals fade, and new life breathes.", + "message": { + "role": "assistant", + "content": "In cherry blossom's gentle breeze ... A delicate balance of life and death, as petals fade, and new life breathes." + }, "index": 0, "finish_reason": "stop" } ],fern/fern/pages/performance/tuning.mdx-38-41 (1)
38-41: Typo: missing word "leads".The sentence is missing a word, making it grammatically incorrect.
<Callout intent="info"> -for decode-only engines, sometimes larger number of GPUs has to larger KV cache per GPU and more decoding requests running in parallel, which leads to both better throughput/GPU and better latency/user. +For decode-only engines, sometimes a larger number of GPUs leads to larger KV cache per GPU and more decoding requests running in parallel, which leads to both better throughput/GPU and better latency/user. For example, for Llama-3.3-70b NVFP4 quantization on B200 in vLLM with 0.9 free GPU memory fraction: </Callout>Also note: sentence should start with capital "F" and include article "a" before "larger number".
fern/fern/pages/kvbm/trtllm-setup.mdx-129-133 (1)
129-133: Inconsistent metric naming/description forh2dsuffix.The
h2dsuffix is used inconsistently:
- Line 130:
kvbm_offload_blocks_h2ddescribed as "host to disk"- Line 133:
kvbm_onboard_blocks_h2ddescribed as "host to device"The standard convention is
h2d= "host to device". Please verify the correct metric names and descriptions. If line 130 truly means "host to disk", consider renaming the metric to something likeh2diskorh2d_diskfor clarity.fern/fern/pages/api/nixl_connect/writable_operation.mdx-36-37 (1)
36-37: Minor grammatical issue in code comment.📝 Suggested fix
- # Wait the remote worker to complete its write operation to local_tensor. + # Wait for the remote worker to complete its write operation to local_tensor.fern/fern/pages/backends/trtllm/multinode/multinode-examples.mdx-79-83 (1)
79-83: Typo: "iamge" should be "image".Line 81 has a typo in the comment.
Fix
```bash # NOTE: IMAGE must be set manually for now -# To build an iamge, see the steps here: +# To build an image, see the steps here: # https://github.com/ai-dynamo/dynamo/tree/main/docs/backends/trtllm/README.md#build-container export IMAGE="<dynamo_trtllm_image>"fern/fern/pages/design-docs/architecture.mdx-28-28 (1)
28-28: Grammar: subject-verb agreement."A disaggregated approach that separate" should be "separates" (singular verb to match "approach").
Fix
-- *GPU underutilization*: Traditional monolithic inference pipelines often leave GPUs idle due to the imbalance between prefill and decode stages. Prefill (which generates large prompt embeddings) is highly compute-intensive, while decode (which generates tokens) is latency-sensitive. A disaggregated approach that separate prefill and decode ensures optimal GPU utilization and increases overall throughput ([DistServe](https://arxiv.org/abs/2401.09670)). +- *GPU underutilization*: Traditional monolithic inference pipelines often leave GPUs idle due to the imbalance between prefill and decode stages. Prefill (which generates large prompt embeddings) is highly compute-intensive, while decode (which generates tokens) is latency-sensitive. A disaggregated approach that separates prefill and decode ensures optimal GPU utilization and increases overall throughput ([DistServe](https://arxiv.org/abs/2401.09670)).fern/fern/pages/design-docs/architecture.mdx-80-80 (1)
80-80: Typo: "preceeding" should be "preceding".Also, "KV aware routing" should be hyphenated as "KV-aware routing" for consistency with compound adjective usage elsewhere in the document.
Fix
-Existing routing methods, including load-based routing, overlook the specific properties of LLMs that could improve performance. Addressing this, routing user queries to workers with the highest KV cache hit rate (rather than simply the least busy node) allows for immediate processing, even under heavy load. The preceeding figures illustrate the effectiveness of KV aware routing on 100,000 real R1 user queries, achieving a 3x improvement in TTFT and a 2x reduction in average request latency. Depending on traffic, this approach can also enhance throughput. +Existing routing methods, including load-based routing, overlook the specific properties of LLMs that could improve performance. Addressing this, routing user queries to workers with the highest KV cache hit rate (rather than simply the least busy node) allows for immediate processing, even under heavy load. The preceding figures illustrate the effectiveness of KV-aware routing on 100,000 real R1 user queries, achieving a 3x improvement in TTFT and a 2x reduction in average request latency. Depending on traffic, this approach can also enhance throughput.fern/fern/pages/backends/trtllm/multinode/multinode-examples.mdx-210-223 (1)
210-223: Typo: "succesfully" should be "successfully".Line 213 has a spelling error.
Fix
You can see each rank's output prefixed with the rank at the start of each log line - until the model succesfully finishes loading: + until the model successfully finishes loading:fern/fern/pages/kubernetes/README.mdx-87-108 (1)
87-108: Minor: Redundant namespace creation.Line 91 creates the namespace with
kubectl create namespace ${NAMESPACE}, but the platform installation step (line 65) already uses--create-namespace. If users follow both sections sequentially with the same namespace, the explicitkubectl create namespacewill fail with "already exists" error.Consider either removing line 91 or adding a note that this step is only needed if deploying to a different namespace than the platform.
Suggested clarification
## 3. Deploy Your First Model ```bash -export NAMESPACE=dynamo-system -kubectl create namespace ${NAMESPACE} +# Use same namespace as platform, or create a new one for model isolation +export NAMESPACE=dynamo-system # or your preferred namespace +# kubectl create namespace ${NAMESPACE} # Only if using a different namespace # to pull model from HFfern/fern/pages/kubernetes/README.mdx-177-198 (1)
177-198: Inconsistent dynamoNamespace values in example.The example shows
FrontendusingdynamoNamespace: my-llm(line 178) whileVllmDecodeWorkerusesdynamoNamespace: dynamo-dev(line 185). Based on the terminology section (lines 18-22), components within the same deployment typically share a Dynamo namespace for service discovery.Consider using the same
dynamoNamespacevalue for both services to avoid confusion, or add a comment explaining when different namespaces would be appropriate.Suggested fix
Frontend: dynamoNamespace: my-llm componentType: frontend replicas: 1 extraPodSpec: mainContainer: image: your-image VllmDecodeWorker: # or SGLangDecodeWorker, TrtllmDecodeWorker - dynamoNamespace: dynamo-dev + dynamoNamespace: my-llm # Should match Frontend for service discovery componentType: workerfern/fern/pages/design-docs/architecture.mdx-44-48 (1)
44-48: Fix incorrect link reference: use hyphen instead of underscore.The link
disagg_servingdoes not match the actual filedisagg-serving.mdx. Change line 44 to:- [Dynamo Disaggregated Serving](disagg-serving)fern/fern/pages/kvbm/kvbm_design_deepdive.mdx-226-226 (1)
226-226: Minor grammar fix: hyphenate "high-level"."High level" should be hyphenated when used as a compound adjective before a noun.
Suggested fix
-Now, to enable fast lookup and dynamic tiering, storage vendors may build internal data structures using the received event stream. Here is a high level conceptual design: +Now, to enable fast lookup and dynamic tiering, storage vendors may build internal data structures using the received event stream. Here is a high-level conceptual design:fern/fern/pages/api/nixl_connect/README.mdx-46-48 (1)
46-48: Fix grammatical error in the description.Line 48 is missing a word. "registered by a remote worker to writable" should be "registered by a remote worker to be writable" or "registered by a remote worker as writable."
Suggested fix
4. **Write to registered, remote memory**: - Write local memory buffer(s) to remote memory buffer(s) registered by a remote worker to writable. + Write local memory buffer(s) to remote memory buffer(s) registered by a remote worker as writable.fern/fern/pages/kvbm/kvbm_design_deepdive.mdx-28-31 (1)
28-31: Fix typo: missing space in "BlockLayouttrait".Line 30 has a typo where "BlockLayout" and "trait" are concatenated without a space.
Suggested fix
-Each block is a 2D array `[num_layers][page_size × inner_dim]`. `BlockLayouttrait` abstracts the memory layout. The default implementation,`FullyContiguous`, stores all layers for all blocks in one region with alignment-aware stride computation: +Each block is a 2D array `[num_layers][page_size × inner_dim]`. The `BlockLayout` trait abstracts the memory layout. The default implementation, `FullyContiguous`, stores all layers for all blocks in one region with alignment-aware stride computation:fern/fern/pages/backends/trtllm/README.mdx-59-62 (1)
59-62: Grammar: extra word "the".Line 61: "all of our the common" should be "all of the common" or "all our common".
🔤 Proposed fix
-Below we provide a guide that lets you run all of our the common deployment patterns on a single node. +Below we provide a guide that lets you run all the common deployment patterns on a single node.fern/fern/pages/design-docs/disagg-serving.mdx-78-81 (1)
78-81: Typo: "comptued" should be "computed".Line 80 contains a typo in the mermaid diagram label.
🔤 Proposed fix
- P-->>D: Remote NIXL write for comptued KV blocks (non-block) + P-->>D: Remote NIXL write for computed KV blocks (non-block)fern/fern/pages/planner/sla_planner_quickstart.mdx-471-485 (1)
471-485: Minor: Missing period after "etc".In American English style, "etc" should have a period.
🔤 Proposed fix
-By default, profiling jobs save essential data to ConfigMaps for planner integration. For advanced users who need access to detailed artifacts (logs, performance plots, AIPerf results, etc), configure the DGDR to use `dynamo-pvc`. +By default, profiling jobs save essential data to ConfigMaps for planner integration. For advanced users who need access to detailed artifacts (logs, performance plots, AIPerf results, etc.), configure the DGDR to use `dynamo-pvc`.fern/fern/pages/kvbm/kvbm_architecture.mdx-17-18 (1)
17-18: Typo: "eviction was on policies" appears corrupted.This phrase doesn't make grammatical sense. It likely should be "eviction based on policies" or "eviction policies".
🔤 Proposed fix
-The middle layer, the KVBM layer, encapsulates the core logic of the KV block manager and serves as the runtime substrate for managing block memory. The KVBM adapter layer normalizes the representations and data layout for the incoming requests across runtimes and forwards them to the core memory manager. The KVBM and the core modules implement required internal functionality, such as table lookups, memory allocation, block layout management, lifecycle, and state transitions and block reuse or eviction was on policies. +The middle layer, the KVBM layer, encapsulates the core logic of the KV block manager and serves as the runtime substrate for managing block memory. The KVBM adapter layer normalizes the representations and data layout for the incoming requests across runtimes and forwards them to the core memory manager. The KVBM and the core modules implement required internal functionality, such as table lookups, memory allocation, block layout management, lifecycle, state transitions, and block reuse or eviction based on policies.fern/fern/pages/backends/trtllm/README.mdx-163-173 (1)
163-173: Duplicate sections: Client and Benchmarking appear twice.Lines 163-173 (Client and Benchmarking) are duplicated at lines 207-216. Consider removing one set.
🔧 Suggested fix
Remove the duplicate "Client" and "Benchmarking" sections (lines 207-216), keeping only the first occurrence (lines 163-173). Alternatively, if the second occurrence is intentional as a summary section, remove the first occurrence to avoid redundancy in the Table of Contents flow.
## Client See [client](../../backends/sglang/README#testing-the-deployment) section to learn how to send request to the deployment. NOTE: To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend <args>`. ## Benchmarking To benchmark your deployment with AIPerf, see this utility script, configuring the `model` name and `host` based on your deployment: [perf.sh](https://github.com/ai-dynamo/dynamo/tree/main/benchmarks/llm/perf.sh) -## Client - -See [client](../../backends/sglang/README#testing-the-deployment) section to learn how to send request to the deployment. - -NOTE: To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend <args>`. - -## Benchmarking - -To benchmark your deployment with AIPerf, see this utility script, configuring the -`model` name and `host` based on your deployment: [perf.sh](https://github.com/ai-dynamo/dynamo/tree/main/benchmarks/llm/perf.sh)fern/fern/pages/planner/sla_planner.mdx-66-73 (1)
66-73: Grammar: subject-verb agreement.Line 68: "three load prediction model is supported" should be "three load prediction models are supported".
🔤 Proposed fix
-The SLA planner use load predictor to predict the number of requests, ISL, and OSL in the next adjustment interval. Currently, three load prediction model is supported: +The SLA planner uses a load predictor to predict the number of requests, ISL, and OSL in the next adjustment interval. Currently, three load prediction models are supported:fern/fern/pages/planner/sla_planner.mdx-96-105 (1)
96-105: Minor grammar fixes needed.
- Line 99: "maybe" should be "may be" (two words)
- Line 102: "calculate" should be "calculates"
🔤 Proposed fixes
-- **ITL**: actual ITL maybe affected by chunked small prefill request in decode engine. +- **ITL**: actual ITL may be affected by chunked small prefill requests in the decode engine.-SLA planner calculate the correction factor with +SLA planner calculates the correction factor with
|
@coderabbitai we'll address content issues in a follow-up PR. For now, review for issues related to
|
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain
🏁 Script executed: Length of output: 2352 🏁 Script executed: Length of output: 6570 🏁 Script executed: Length of output: 41314 🏁 Script executed: Length of output: 1278 🏁 Script executed: Length of output: 1219 🏁 Script executed: Length of output: 490
🚨 Critical Issues1. Missing Version Configuration FilesThe versions:
- display-name: Next
path: ./versions/latest.yml
- display-name: v0.7.1
path: ./versions/v0.7.1.yml
# ... (all 16 version files are missing)Impact: Fern build will fail without these version configuration files. 2. Unbalanced Code Fences (Will break rendering)
3. Missing Frontmatter
|
d4d16ef to
5702911
Compare
|
Wow, big PR!
|
257be87 to
7d21ec5
Compare
|
@grahamking Hey! Here's a link to Fern. Initially, I implemented this with Docusaurus, but I spoke with @nealvaidya and he's the one who suggested I use Fern because there's existing work with NVIDIA to pay for Fern's doc hosting services. Yes, the plan is to replace Sphinx and move the Fern docs into the docs/ folder, right now I just made the fern/ folder so we can start building the site without taking down the existing Sphinx set up. |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 12
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🤖 Fix all issues with AI agents
In `@fern/fern/pages/api/nixl_connect/device.mdx`:
- Around line 30-32: Replace all internal links in
fern/pages/api/nixl_connect/device.mdx that use kebab-case with the actual
underscore filenames: change
/additional-resources/api-reference/nixl-connect/device-kind →
/additional-resources/api-reference/nixl-connect/device_kind, /operation-status
→ /operation_status, /read-operation → /read_operation, /readable-operation →
/readable_operation, /writable-operation → /writable_operation, and
/rdma-metadata → /rdma_metadata; update the anchor links shown in the diff (the
two links in the device description and the links later in the file) so they
reference the underscore versions to match the actual .mdx file names.
In `@fern/fern/pages/backends/sglang/README.mdx`:
- Around line 39-44: Update the six broken links in the feature support matrix
by replacing the incorrect paths with the corrected ones: change
"/design-docs/disaggregated-serving" to "/design-docs/disagg-serving" for both
occurrences on lines containing "Disaggregated Serving" and "Conditional
Disaggregation"; change "/additional-resources/router-details/kv-cache-routing"
(the "KV-Aware Routing" link) to "/router/kv_cache_routing"; change
"/components/planner/sla-based-planner" (the "SLA-Based Planner" link) to
"/planner/sla_planner"; change
"/additional-resources/multimodal-details/sg-lang" (the "Multimodal Support"
link) to "/multimodal/sglang"; and change "/components/kvbm/architecture" (the
"KVBM" link) to "/kvbm/kvbm_architecture".
In `@fern/fern/pages/backends/trtllm/README.mdx`:
- Around line 44-49: Update the broken internal links in
fern/pages/backends/trtllm/README.mdx to use the correct path prefixes and file
names: replace "/design-docs/disaggregated-serving" with
"/design-docs/disagg-serving",
"/additional-resources/router-details/kv-cache-routing" with
"/router/kv-cache-routing", "/components/planner/sla-based-planner" with
"/planner/sla-based-planner", "/additional-resources/load-planner" with
"/planner/load-planner" (matching load_planner.mdx), and
"/components/kvbm/architecture" with "/kvbm/architecture" (matching
kvbm_architecture.mdx); apply the same corrections for the identical link
patterns found in backends/sglang/README.mdx, backends/vllm/README.mdx, and
design-docs/architecture.mdx so navigation entries match next.yml mapping.
In `@fern/fern/pages/benchmarks/sla_driven_profiling.mdx`:
- Line 11: Update the broken internal links in the MDX by replacing the old
paths with the corrected ones: change the link with text "SLA-Driven Profiling
and Planner Deployment Quick Start Guide" (currently pointing to
/components/planner/sla-planner-quick-start) to /planner/sla-planner-quickstart;
update any links pointing to /components/planner/sla-based-planner to
/planner/sla-planner; change links pointing to
/user-guides/tuning-disaggregated-performance to /performance/tuning; change
/additional-resources/advanced-kubernetes/api-reference to
/kubernetes/api-reference; and for the link currently pointing to
/kubernetes-deployment/observability-k-8-s/metrics, verify the correct target
with the docs team and either fix to the proper metrics path or remove/flag the
link if no matching doc exists (search for link text "metrics" or
"observability" in the file to locate it).
In `@fern/fern/pages/development/backend-guide.mdx`:
- Around line 143-165: The two internal links pointing to
/additional-resources/... are broken; update the link targets for "Request
Migration Architecture" and "Request Cancellation Architecture" so they point to
the correct internal paths (/fault-tolerance/request-migration and
/fault-tolerance/request-cancellation) by replacing the strings
"/additional-resources/fault-tolerance/request-migration" and
"/additional-resources/fault-tolerance/request-cancellation" in the Request
Migration and Request Cancellation sections of backend-guide.mdx.
In `@fern/fern/pages/getting-started/quickstart.mdx`:
- Around line 77-100: Update the broken internal links in the Documentation
Overview list: replace backend paths `/components/backends/v-llm`,
`/components/backends/sg-lang`, and `/components/backends/tensor-rt-llm` with
`/components/backends/vllm`, `/components/backends/sglang`, and
`/components/backends/tensorrt-llm` respectively; update user-guide paths
`/user-guides/tuning-disaggregated-performance` →
`/user-guides/disaggregation-and-performance-tuning` and
`/user-guides/finding-best-initial-configs` →
`/user-guides/finding-best-initial-configs-using-aiconfigurator`; and remove or
replace the non-existent `/additional-resources/cli-reference` entry (in the
same list block that contains "Performance & Tuning" and "Getting Help") with an
existing valid page or omit it.
In `@fern/fern/pages/kubernetes/deployment/create_deployment.mdx`:
- Line 154: Update the incomplete sentence and broken link in the line
containing "If you are a Dynamo contributor the [dynamo run
guide](/additional-resources/cli-reference)"; change the link target to
/reference/cli and insert the missing verb and punctuation so it reads like "If
you are a Dynamo contributor, see the [dynamo run guide](/reference/cli) for
details on how to run this command."
In `@fern/fern/pages/kvbm/kvbm_design_deepdive.mdx`:
- Around line 1-3: The frontmatter 'title' in kvbm_design_deepdive.mdx is
incorrect (it currently reads "KVBM components"); update the YAML frontmatter
title field to a correct, descriptive title that matches this file (e.g., "KVBM
design deep dive") so the page title, navigation, and browser tab reflect the
file's purpose; locate and edit the top-of-file frontmatter 'title' key to the
new value.
In `@fern/fern/pages/kvbm/kvbm_integrations.mdx`:
- Around line 22-23: In kvbm_integrations.mdx update the two internal links that
point to /components/kvbm/kvbm-in-v-llm and /components/kvbm/kvbm-in-trtllm so
they match the actual target filenames used in this PR (vllm-setup.mdx and
trtllm-setup.mdx); either change the hrefs to the correct paths that resolve to
vllm-setup and trtllm-setup (e.g., /components/kvbm/vllm-setup and
/components/kvbm/trtllm-setup) or update the docs.yml navigation to create
aliases for the existing paths—ensure the link targets in kvbm_integrations.mdx
exactly match the resolved route names used by Fern.
In `@fern/fern/pages/kvbm/vllm-setup.mdx`:
- Line 12: The internal link "/components/kvbm/architecture" in vllm-setup.mdx
is using the same broken pattern as trtllm-setup.mdx; open vllm-setup.mdx and
replace that href with the correct Fern-docs path that matches the site's
navigation (use the same corrected path you applied in trtllm-setup.mdx),
ensuring the link target (the "/components/kvbm/architecture" string) matches an
existing page slug in the repository and updates any relative/absolute pathing
accordingly.
In `@fern/fern/pages/multimodal/trtllm.mdx`:
- Line 36: Update the broken internal link in the TRT-LLM page by replacing the
currently referenced path
"/user-guides/multimodality-support#architecture-patterns" with the correct
internal path "/multimodal#architecture-patterns"; locate the sentence
containing "TRT-LLM supports aggregated and traditional disaggregated patterns"
(the link on "Architecture Patterns") in trtllm.mdx and make the same
replacement in the other files that contain the identical broken link (vllm.mdx
and sglang.mdx).
In `@fern/fern/pages/observability/metrics.mdx`:
- Around line 111-116: Update the metric names to use the project's `_total`
suffix convention: rename `dynamo_component_inflight_requests` and
`dynamo_frontend_inflight_requests` to
`dynamo_component_inflight_requests_total` and
`dynamo_frontend_inflight_requests_total`, and audit the other listed metrics
(`dynamo_component_request_bytes_total`,
`dynamo_component_request_duration_seconds`, `dynamo_component_requests_total`,
`dynamo_component_response_bytes_total`, `dynamo_component_uptime_seconds`) to
ensure any gauge-style metrics follow the `_total` naming (also apply same
changes mentioned for lines 153-156). Locate and update occurrences in the
observability docs and any corresponding metric export code so names match
exactly.
♻️ Duplicate comments (2)
fern/fern/pages/frontends/kserve.mdx (1)
97-102: Use version-stable GitHub links (avoidtree/main).These links will drift as
mainchanges and break versioned docs. Please pin to tags/SHAs or use version-relative references for allmainlinks in this section.Also applies to: 106-106
fern/fern/pages/kubernetes/autoscaling.mdx (1)
47-47: Inconsistent documentation about DGDSA default behavior.The documentation contains contradictory statements:
- Line 47: "the operator automatically creates one adapter per service"
- Line 102: "When DGDSA is enabled (the default)"
- Line 127: "By default, no DGDSA is created for services"
- Line 594-596: "With DGDSA Enabled (Default)"
Please clarify the actual default behavior and ensure consistency throughout the document.
Also applies to: 127-128
🟡 Minor comments (35)
fern/fern/pages/frontends/kserve.mdx-12-12 (1)
12-12: Hyphenate compound modifiers for readability.Examples: “industry-standard”, “tensor-based”, “client-side”.
✏️ Suggested edits
-[KServe v2 API](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2) is one of the industry standard protocol for machine learning model inference. +[KServe v2 API](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2) is one of the industry-standard protocol for machine learning model inference. -* `ModelType::TensorBased` and `ModelInput::Tensor`: Combination for backend that is used for generic tensor based inference +* `ModelType::TensorBased` and `ModelInput::Tensor`: Combination for backend that is used for generic tensor-based inference -... specific conversion between generic tensor based messages ... +... specific conversion between generic tensor-based messages ... -This combination is used when the user is migrating an existing KServe based backend ... +This combination is used when the user is migrating an existing KServe-based backend ... -... metadata as tensor based deployment is generic ... +... metadata as tensor-based deployment is generic ... -... returned for client side logic ... +... returned for client-side logic ...Also applies to: 35-35, 41-41, 92-98
fern/fern/pages/guides/request_plane.mdx-163-167 (1)
163-167: Fix hyphenation for compound adjective."KV based routing" should be "KV-based routing" (hyphenated compound adjective). This appears twice in the NATS usage section. Based on static analysis hints.
📝 Suggested fix
**When to use NATS:** - Production deployments with service discovery -- Currently KV based routing require NATS. If you want to completely disable NATS, KV based routing won't be available +- Currently KV-based routing requires NATS. If you want to completely disable NATS, KV-based routing won't be available - Need for message replay and persistence featuresNote: Also corrected "require" → "requires" for subject-verb agreement.
fern/fern/pages/development/backend-guide.mdx-104-104 (1)
104-104: Typo in example: "generat" should be "generate".📝 Suggested fix
-Node 2: namespace: llama3-1-8b, component: backend, endpoint: generat, model: /data/Llama-3.1-8B-Instruct/ +Node 2: namespace: llama3-1-8b, component: backend, endpoint: generate, model: /data/Llama-3.1-8B-Instruct/fern/fern/pages/benchmarks/kv-router-ab-testing.mdx-104-104 (1)
104-104: Update link to internal Fern documentation.The link on line 104 should point to the internal Fern documentation page instead of the external GitHub URL. An installation guide exists at
/kubernetes/installation_guidein the Fern docs. Update the link fromhttps://github.com/ai-dynamo/dynamo/blob/main/docs/kubernetes/installation_guide.mdto the relative path.fern/fern/pages/kvbm/kvbm_architecture.mdx-17-17 (1)
17-17: Typo: "eviction was on policies" appears garbled.The phrase "eviction was on policies" doesn't make grammatical sense. This likely should be "eviction based on policies" or similar.
Suggested fix
-The middle layer, the KVBM layer, encapsulates the core logic of the KV block manager and serves as the runtime substrate for managing block memory. The KVBM adapter layer normalizes the representations and data layout for the incoming requests across runtimes and forwards them to the core memory manager. The KVBM and the core modules implement required internal functionality, such as table lookups, memory allocation, block layout management, lifecycle, and state transitions and block reuse or eviction was on policies. The KVBM layer also has required abstractions for external components to override or augment its behavior. +The middle layer, the KVBM layer, encapsulates the core logic of the KV block manager and serves as the runtime substrate for managing block memory. The KVBM adapter layer normalizes the representations and data layout for the incoming requests across runtimes and forwards them to the core memory manager. The KVBM and the core modules implement required internal functionality, such as table lookups, memory allocation, block layout management, lifecycle, and state transitions and block reuse or eviction based on policies. The KVBM layer also has required abstractions for external components to override or augment its behavior.fern/fern/pages/planner/load_planner.mdx-30-32 (1)
30-32: Fix duplicate list numbering.
Line 31 and Line 32 both use1.; should be1.and2..✏️ Suggested edit
-1. After a new decode worker is added, since it needs time to populate the kv cache, planner doesn't scale down the number of decode workers in the next `NEW_DECODE_WORKER_GRACE_PERIOD=3` adjustment intervals. -1. We do not scale up prefill worker if the prefill queue size is estimated to reduce below the `--prefill-queue-scale-up-threshold` within the next `NEW_PREFILL_WORKER_QUEUE_BUFFER_PERIOD=3` adjustment intervals following the trend observed in the current adjustment interval. +1. After a new decode worker is added, since it needs time to populate the kv cache, planner doesn't scale down the number of decode workers in the next `NEW_DECODE_WORKER_GRACE_PERIOD=3` adjustment intervals. +2. We do not scale up prefill worker if the prefill queue size is estimated to reduce below the `--prefill-queue-scale-up-threshold` within the next `NEW_PREFILL_WORKER_QUEUE_BUFFER_PERIOD=3` adjustment intervals following the trend observed in the current adjustment interval.fern/fern/pages/performance/tuning.mdx-10-40 (1)
10-40: Tighten a few grammar/wording typos in the intro/Callout.
Small edits improve readability without changing meaning (Line 12, Line 26, Line 39).✏️ Suggested edits
-Specifically, there are three sets of parameters that needs to be tuned: +Specifically, there are three sets of parameters that need to be tuned: -The next thing to decide is how many numbers of GPU to serve the model. +The next thing to decide is how many GPUs to serve the model. -For decode-only engines, sometimes larger number of GPUs has to larger KV cache per GPU and more decoding requests running in parallel, which leads to both better throughput/GPU and better latency/user. +For decode-only engines, a larger number of GPUs can yield larger KV cache per GPU and more decoding requests running in parallel, which leads to both better throughput/GPU and better latency/user.fern/fern/pages/design-docs/architecture.mdx-42-49 (1)
42-49: Fix internal link slug for disaggregated serving design doc.
The link at line 44 uses/design-docs/disaggregated-serving, but the file is nameddisagg-serving.mdxwith no explicit slug override. Update the link to/design-docs/disagg-servingto avoid a 404.fern/fern/pages/design-docs/disagg-serving.mdx-78-81 (1)
78-81: Fix typo: "comptued" → "computed".Line 80 contains a spelling error in the diagram message.
📝 Suggested fix
P-->>D: Remote NIXL read for prefix hit KV blocks (non-block) P->>P: Execute prefill - P-->>D: Remote NIXL write for comptued KV blocks (non-block) + P-->>D: Remote NIXL write for computed KV blocks (non-block)fern/fern/pages/design-docs/disagg-serving.mdx-89-89 (1)
89-89: Fix subject-verb agreement: "leverage" → "leverages".📝 Suggested fix
-The key to high-performance disaggregation is efficient KV transfer. Dynamo leverage NIXL to transfer KV cache directly from the VRAM of prefill engine to the VRAM of decode engine. In addition, the KV transfer is non-blocking, allowing GPU forward pass to serve other requests in addition to the KV transfer. +The key to high-performance disaggregation is efficient KV transfer. Dynamo leverages NIXL to transfer KV cache directly from the VRAM of prefill engine to the VRAM of decode engine. In addition, the KV transfer is non-blocking, allowing GPU forward pass to serve other requests in addition to the KV transfer.fern/fern/pages/design-docs/distributed_runtime.mdx-46-46 (1)
46-46: Fix grammar: "isn't be registered" → "isn't registered".There's a grammatical error in this sentence.
📝 Suggested fix
-- `Component`: When a `Component` object is created, similar to `Namespace`, it isn't be registered in etcd. When `create_service` is called, it creates a NATS service group using `{namespace_name}.{service_name}` as the service identifier and registers a service in the registry of the `Component`, where the registry is an internal data structure that tracks all services and endpoints within the `DistributedRuntime`. +- `Component`: When a `Component` object is created, similar to `Namespace`, it isn't registered in etcd. When `create_service` is called, it creates a NATS service group using `{namespace_name}.{service_name}` as the service identifier and registers a service in the registry of the `Component`, where the registry is an internal data structure that tracks all services and endpoints within the `DistributedRuntime`.fern/fern/pages/kubernetes/deployment/minikube.mdx-10-10 (1)
10-10: Clarify setup wording and make GPU flag optional (Line 10, Line 26–Line 28).
Avoids confusion and prevents CPU-only users from hitting a failure.✏️ Suggested edits
-This guide walks through the set up of everything you need to run Dynamo Kubernetes Platform locally. +This guide walks through the setup you need to run Dynamo Kubernetes Platform locally. -# Start Minikube with GPU support (if configured) -minikube start --driver docker --container-runtime docker --gpus all --memory=16000mb --cpus=8 +# Start Minikube (omit --gpus all if you aren't using GPU support) +minikube start --driver docker --container-runtime docker --gpus all --memory=16000mb --cpus=8Also applies to: 26-28
fern/fern/pages/backends/vllm/README.mdx-10-10 (1)
10-10: Minor wording fixes for readability (Line 10, Line 59, Line 173).
These are small grammar/consistency tweaks.✏️ Proposed doc wording edits
-... NIXL based transfer mechanisms ... +... NIXL-based transfer mechanisms ... -Below we provide a guide that lets you run all of our the common deployment patterns on a single node. +Below we provide a guide that lets you run all of the common deployment patterns on a single node. -... Python's builtin hashing ... +... Python's built-in hashing ...Also applies to: 59-59, 173-173
fern/fern/pages/backends/vllm/deepseek-r1.mdx-10-14 (1)
10-14: Fix a few typos/grammar issues (Line 10–Line 14).
Improves professionalism and readability.✏️ Suggested edits
-Dynamo supports running Deepseek R1 with data parallel attention and wide expert parallelism. Each data parallel attention rank is a seperate dynamo component that will emit its own KV Events and Metrics. +Dynamo supports running Deepseek R1 with data parallel attention and wide expert parallelism. Each data parallel attention rank is a separate Dynamo component that will emit its own KV Events and Metrics. -The following script can be adapted to run Deepseek R1 with a variety of different configuration. +The following script can be adapted to run Deepseek R1 with a variety of configurations.fern/fern/pages/backends/vllm/gpt-oss.mdx-115-115 (1)
115-115: Fix typo: "ususally" → "usually".✏️ Suggested fix
-is that the application has a set of tools to aid the assistant provide accurate answer, and it is ususally +is that the application has a set of tools to aid the assistant provide accurate answer, and it is usuallyfern/fern/pages/backends/trtllm/multinode/multinode-examples.mdx-213-213 (1)
213-213: Typo: "succesfully" should be "successfully".📝 Suggested fix
- until the model succesfully finishes loading: + until the model successfully finishes loading:fern/fern/pages/kubernetes/deployment/create_deployment.mdx-230-230 (1)
230-230: Step numbering jumps from Step 3 to Step 6.The document has Steps 1, 2, 3, then jumps directly to Step 6 for LoRA deployment. This suggests either missing intermediate steps or a renumbering oversight.
📝 Suggested fix
-## Step 6: Deploy LoRA Adapters (Optional) +## Step 4: Deploy LoRA Adapters (Optional)Alternatively, if Steps 4-5 exist elsewhere and were removed, ensure the numbering is sequential.
fern/fern/pages/backends/trtllm/multinode/multinode-examples.mdx-81-82 (1)
81-82: Typo: "iamge" should be "image".📝 Suggested fix
# NOTE: IMAGE must be set manually for now -# To build an iamge, see the steps here: +# To build an image, see the steps here:fern/fern/pages/backends/trtllm/multinode/multinode-examples.mdx-43-44 (1)
43-44: Path references old Sphinx docs location.The link
docs/backends/trtllm/README.mdreferences the current Sphinx documentation path. After the Fern migration completes andfern/replacesdocs/, this link will break. Consider using a relative Fern path or noting this needs updating post-migration.Similarly affected: lines 82-83 reference the same
docs/path pattern.fern/fern/pages/multimodal/sglang.mdx-336-340 (1)
336-340: Clarify NIXL usage for E/P/D mode in the table.The table states E/P/D transfers embeddings to "Prefill" but line 142 and the workflow diagram show embeddings go to the Decode Worker first (which is the entry point), then Decode coordinates with Prefill. This creates a potential inconsistency.
📝 Suggested fix for accuracy
| Use Case | NIXL Used? | Data Transfer | Notes | |----------|------------|---------------|-------| | E/PD (Encode Separate) | Yes | Encoder → PD (embeddings) | Vision encoder separate | -| E/P/D (Full Disaggregation) | Yes | Encoder → Prefill (embeddings) | KV cache via SGLang bootstrap | +| E/P/D (Full Disaggregation) | Yes | Encoder → Decode (embeddings) | KV cache via SGLang bootstrap |fern/fern/pages/backends/trtllm/gpt-oss.mdx-216-216 (1)
216-216: Typo: "ususally" should be "usually".Suggested fix
-is that the application has a set of tools to aid the assistant provide accurate answer, and it is ususally +is that the application has a set of tools to aid the assistant provide accurate answer, and it is usuallyfern/fern/pages/kubernetes/api_reference.mdx-20-25 (1)
20-25: Duplicate paragraph content.Lines 20-21 and 25 contain the same sentence about "Package v1alpha1 contains API Schema definitions". This appears to be unintentional duplication.
Suggested fix
Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group. This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides a high-level, SLA-driven interface for deploying machine learning models on Dynamo. -Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group. - ### Resource Typesfern/fern/pages/backends/trtllm/gpt-oss.mdx-176-176 (1)
176-176: Section numbering inconsistency - step 5 is missing.The instructions jump from "### 4. Launch the Deployment" (line 124) to "### 6. Verify the Deployment is Ready" (line 176). Either add step 5 or renumber to maintain sequential ordering.
Suggested fix
-### 6. Verify the Deployment is Ready +### 5. Verify the Deployment is ReadyAnd update subsequent sections (7 → 6, 8 → 7).
fern/fern/pages/observability/README.mdx-37-37 (1)
37-37: Minor grammatical fix: "Documentations" → "Documentation"."Documentation" is typically used as an uncountable noun in English.
📝 Suggested fix
-## Observability Documentations +## Observability Documentationfern/fern/pages/backends/sglang/gpt-oss.mdx-10-11 (1)
10-11: Fix typo: "ues" → "use".📝 Suggested fix
The gpt-oss-120b guide for SGLang is largely identical to the [guide for vLLM](/additional-resources/backend-details/v-llm/gpt-oss), -please ues the vLLM guide as a reference with the different deployment steps as highlighted below: +please use the vLLM guide as a reference with the different deployment steps as highlighted below:fern/fern/pages/multimodal/index.mdx-18-22 (1)
18-22: Fill or remove the empty “Backend Documentation” section.
Right now it’s an orphaned header; either add links to the backend-specific pages or drop the section to avoid a dead spot in the page.fern/fern/pages/api/nixl_connect/descriptor.mdx-10-11 (1)
10-11: Fix small typos/grammar in the Descriptor overview and registration note.
These are user-facing docs, so it’s worth polishing the wording.✏️ Proposed edits
-Memory descriptor that ensures memory is registered with the NIXL-base I/O subsystem. +Memory descriptor that ensures memory is registered with the NIXL-based I/O subsystem. -3. From a Python `bytes` object. Memory is assumed to reside in CPU addressable host memory. +3. From a Python `bytes` object. Memory is assumed to reside in CPU-addressable host memory. -When the descriptor is assigned to a NIXL operation, it will be automatically registered if was not explicitly registered. +When the descriptor is assigned to a NIXL operation, it will be automatically registered if it was not explicitly registered.Also applies to: 21-21, 39-39
fern/fern/pages/backends/trtllm/llama4_plus_eagle.mdx-20-28 (1)
20-28: Clean up the setup note wording to avoid confusion.✏️ Proposed edits
-* Make sure the (`eagle3_one_model: true`) is set in the LLM API config inside the `examples/backends/trtllm/engine_configs/llama4/eagle` folder. +* Make sure `eagle3_one_model: true` is set in the LLM API config inside the `examples/backends/trtllm/engine_configs/llama4/eagle` folder. -Assuming you have already allocated your nodes via `salloc`, and are -inside an interactive shell on one of the allocated nodes, set the -following environment variables based: +Assuming you have already allocated your nodes via `salloc`, and are +inside an interactive shell on one of the allocated nodes, set the +following environment variables based on your environment:fern/fern/pages/api/nixl_connect/README.mdx-10-14 (1)
10-14: Hyphenate compound adjective.“container hosted” should be “container-hosted” for correct grammar.
💡 Suggested fix
-The `dynamo.nixl_connect` library can be imported by any Dynamo container hosted application. +The `dynamo.nixl_connect` library can be imported by any Dynamo container-hosted application.fern/fern/pages/api/nixl_connect/README.mdx-107-110 (1)
107-110: Fix stray “KV$” typo.Looks like a formatting artifact; should read “KV cache”.
💡 Suggested fix
-6. Prefill Worker receives the embeddings from Encode Worker and generates a key-value cache (KV$) update for Decode Worker's LLM and writes the update directly to the GPU memory reserved for the data. +6. Prefill Worker receives the embeddings from Encode Worker and generates a key-value cache (KV) update for Decode Worker's LLM and writes the update directly to the GPU memory reserved for the data.fern/fern/pages/observability/health-checks.mdx-58-62 (1)
58-62: Fix port mismatch in example request.The earlier section says frontend defaults to 8000; this example uses 8080.
💡 Suggested fix
-curl -s localhost:8080/live -q | jq +curl -s localhost:8000/live -q | jqfern/fern/pages/observability/metrics.mdx-20-23 (1)
20-23: Correct lines 107 and 109: they incorrectly state "port 8081 by default"The table correctly specifies default
-1(disabled), but lines 107 and 109 contradict this by claiming metrics are exposed "on port 8081 by default." The actual default is-1(disabled); 8081 is only the example port shown in documentation. Align these lines with the table to clarify that users must explicitly setDYN_SYSTEM_PORTto enable metrics.fern/fern/pages/backends/trtllm/README.mdx-163-172 (1)
163-172: Duplicate sections: "Client" and "Benchmarking" appear twice.Lines 163-172 contain "Client" and "Benchmarking" sections, but these are duplicated at lines 207-216 with identical content. This redundancy should be removed.
Proposed fix: Remove duplicate sections (lines 207-216)
-## Client - -See [client](/components/backends/sg-lang#testing-the-deployment) section to learn how to send request to the deployment. - -NOTE: To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend <args>`. - -## Benchmarking - -To benchmark your deployment with AIPerf, see this utility script, configuring the -`model` name and `host` based on your deployment: [perf.sh](https://github.com/ai-dynamo/dynamo/tree/main/benchmarks/llm/perf.sh) - ## Multimodal supportAlso applies to: 207-216
fern/fern/pages/backends/trtllm/README.mdx-61-61 (1)
61-61: Fix grammatical error."all of our the common deployment patterns" should be "all of the common deployment patterns" or "all our common deployment patterns".
-Below we provide a guide that lets you run all of our the common deployment patterns on a single node. +Below we provide a guide that lets you run all of the common deployment patterns on a single node.fern/fern/pages/multimodal/trtllm.mdx-259-259 (1)
259-259: Verify container build reference path.Line 259 references
docs/backends/trtllm/README.md#build-containerwhich appears to be a path from the old Sphinx docs structure, not the new Fern structure.-# Container image (build using docs/backends/trtllm/README.md#build-container) +# Container image (build using /backends/tensor-rt-llm#build-container)
🧹 Nitpick comments (18)
fern/fern/pages/guides/request_plane.mdx (1)
171-195: Consolidate duplicate example sections.The "Complete Example" (lines 171-176) and "Real-World Example" (lines 178-195) sections both reference the same script file
examples/backends/vllm/launch/agg_request_planes.sh. Consider consolidating these into a single section to avoid redundancy.📝 Suggested consolidation
## Complete Example -Here's a complete example showing how to launch a Dynamo deployment with different request planes: - -See [`examples/backends/vllm/launch/agg_request_planes.sh`](https://github.com/ai-dynamo/dynamo/tree/main/examples/backends/vllm/launch/agg_request_planes.sh) for a complete working example that demonstrates launching Dynamo with TCP, HTTP, or NATS request planes. - - -## Real-World Example - -The Dynamo repository includes a complete example demonstrating all three request planes: +The Dynamo repository includes a complete working example demonstrating all three request planes: **Location:** `examples/backends/vllm/launch/agg_request_planes.sh` +See the [source on GitHub](https://github.com/ai-dynamo/dynamo/tree/main/examples/backends/vllm/launch/agg_request_planes.sh). + ```bash cd examples/backends/vllm/launchfern/fern/pages/development/backend-guide.mdx (2)
19-58: Code example has missing imports and no language identifier.The code block is missing a language identifier for proper syntax highlighting (should be
python), and the example is missing imports foruvloopandasynciowhich are used at lines 56-57.📝 Suggested improvements
-``` +```python from dynamo.llm import ModelInput, ModelType, register_llm from dynamo.runtime import DistributedRuntime, dynamo_worker +import asyncio +import uvloop
87-87: Minor grammar: use hyphenated "load-balanced" as compound adjective.📝 Suggested fix
-* *Component*: A load balanced service needed to run that pipeline. +* *Component*: A load-balanced service needed to run that pipeline.fern/fern/pages/getting-started/intro.mdx (1)
1-72: Clarify the purpose of this file vs.quickstart.mdx- content appears duplicated.This file (
intro.mdx) has the same title "Welcome to NVIDIA Dynamo" and nearly identical content asquickstart.mdx(lines 1-72 are essentially the same). Having two files with the same title in the same directory will cause confusion in navigation and SEO.Consider one of the following:
- Remove one file if they serve the same purpose
- Differentiate the content - e.g., make
intro.mdxa high-level overview without quickstart commands, and keepquickstart.mdxfocused on hands-on setup- Rename with distinct titles if both are needed
fern/fern/pages/benchmarks/sla_driven_profiling.mdx (1)
106-106: Optional: Minor grammar and style improvements.Consider these minor refinements for more formal technical writing:
- Line 106: "fix this issue" → consider "resolve this issue" for more formal wording
- Line 108: Hyphenate compound adjectives: "per GPU throughput" → "per-GPU throughput", "y coordinate" → "y-coordinate"
- Line 112: Hyphenate compound adjective: "computation bound MLP kernel" → "computation-bound MLP kernel"
📝 Suggested style improvements
-We are working on framework-side change to fix this issue. For example, the below plot shows the decode parallelization mapping sweep results for H100 for deepseek-ai/DeepSeek-R1-Distill-Llama-8B. +We are working on framework-side change to resolve this issue. For example, the below plot shows the decode parallelization mapping sweep results for H100 for deepseek-ai/DeepSeek-R1-Distill-Llama-8B.-4. **Recommendation**: Selects optimal parallelization mapping for prefill and decode that achieves the highest per GPU throughput while adhering the SLA on TTFT and ITL. Specifically, the profiler will choose the point (or a point on the curve for decode) that is left to the vertical red dashed line that represents the SLAs while has the highest y coordinate (throughput per GPU). +4. **Recommendation**: Selects optimal parallelization mapping for prefill and decode that achieves the highest per-GPU throughput while adhering the SLA on TTFT and ITL. Specifically, the profiler will choose the point (or a point on the curve for decode) that is left to the vertical red dashed line that represents the SLAs while has the highest y-coordinate (throughput per GPU).-The active kv usage determines the complexity of the memory-bounded attention kernel while the active kv usage divided the average context length determines the complexity of the computation bound MLP kernel. +The active kv usage determines the complexity of the memory-bounded attention kernel while the active kv usage divided the average context length determines the complexity of the computation-bound MLP kernel.Also applies to: 108-108, 112-112
fern/fern/pages/benchmarks/kv-router-ab-testing.mdx (1)
797-801: Consider clarifying that files are user-created, not provided.The appendix references files like
prepare-dataset.shand "Results CSVs" that aren't provided in the guide. Users create these locally by following the steps. Consider adding a brief note clarifying this, or removing references to files that are implicitly created through the guide's commands.📝 Suggested clarification
## Appendix: Files Reference -- `router-off-deployment.yaml`: Standard routing deployment -- `router-on-deployment.yaml`: KV router enabled deployment -- `benchmark-job.yaml`: AIPerf benchmark pod -- `prepare-dataset.sh`: Dataset preparation script -- Results CSVs: Detailed metrics from AIPerf +**Files you create during this guide:** +- `router-off-deployment.yaml`: Standard routing deployment +- `router-on-deployment.yaml`: KV router enabled deployment +- `benchmark-job.yaml`: AIPerf benchmark pod +- `mooncake_trace_4x.jsonl`: Prepared benchmark dataset +- `router_off_results.csv` / `router_on_results.csv`: AIPerf output metricsfern/fern/pages/kvbm/trtllm-setup.mdx (1)
100-100: Minor grammatical issue: missing subject.The sentence "Alternatively, can use..." is missing a subject.
Suggested fix
-Alternatively, can use "trtllm-serve" with KVBM by replacing the above two [DYNAMO] cmds with below: +Alternatively, you can use `trtllm-serve` with KVBM by replacing the above two [DYNAMO] commands with the following:fern/fern/pages/kvbm/kvbm_integrations.mdx (2)
32-33: Consider escaping or spacing the ampersand for reliable rendering.
Host&Diskmay render inconsistently across Markdown processors. UsingHost & DiskorHost and Diskwould be safer.Suggested fix
- -**Offloading blocks from Device to Host&Disk** + +**Offloading blocks from Device to Host & Disk**
10-11: Consider breaking up dense paragraph for readability.Lines 10-11 contain a lot of information in a single block. For documentation clarity, consider using bullet points or splitting into multiple paragraphs to separate the Scheduler and Worker component descriptions.
fern/fern/pages/kvbm/vllm-setup.mdx (1)
90-90: Minor grammatical issue: missing subject (same as trtllm-setup).Suggested fix
-Alternatively, can use `vllm serve` directly to use KVBM for aggregated serving: +Alternatively, you can use `vllm serve` directly to use KVBM for aggregated serving:fern/fern/pages/backends/vllm/prometheus.mdx (1)
14-20: Optional: consolidate repeated “For …” sentences into a short “See also” list (Line 14–Line 20).
Helps flow without changing meaning.♻️ Suggested refactor
-**For the complete and authoritative list of all vLLM metrics**, always refer to the [official vLLM Metrics Design documentation](https://docs.vllm.ai/en/latest/design/metrics.html). - -**For LMCache metrics and integration**, see the [LMCache Integration Guide](/components/kvbm/lm-cache-integration). - -**For Dynamo runtime metrics**, see the [Dynamo Metrics Guide](/user-guides/observability-local/metrics). - -**For visualization setup instructions**, see the [Prometheus and Grafana Setup Guide](/user-guides/observability-local/prometheus-grafana-setup). +**See also:** +- [vLLM Metrics Design Documentation](https://docs.vllm.ai/en/latest/design/metrics.html) +- [LMCache Integration Guide](/components/kvbm/lm-cache-integration) +- [Dynamo Metrics Guide](/user-guides/observability-local/metrics) +- [Prometheus and Grafana Setup Guide](/user-guides/observability-local/prometheus-grafana-setup)fern/fern/pages/backends/sglang/README.mdx (1)
138-138: Minor style suggestion: simplify phrasing.Consider simplifying "in order to support" to "to support" for conciseness.
✏️ Suggested change
-We are in the process of shipping pre-built docker containers that contain installations of DeepEP, DeepGEMM, and NVSHMEM in order to support WideEP and P/D. For now, you can quickly build the container from source with the following command. +We are in the process of shipping pre-built docker containers that contain installations of DeepEP, DeepGEMM, and NVSHMEM to support WideEP and P/D. For now, you can quickly build the container from source with the following command.fern/fern/pages/kubernetes/grove.mdx (1)
99-105: Consider varying sentence structure for better flow.Three consecutive sentences begin with "For". Consider rewording for variety.
✏️ Suggested revision
-For KAI Scheduler, see the [KAI Scheduler Deployment Guide](https://github.com/NVIDIA/KAI-Scheduler). +See the [KAI Scheduler Deployment Guide](https://github.com/NVIDIA/KAI-Scheduler) for installation instructions. -For installation instructions, see the [Grove Installation Guide](https://github.com/NVIDIA/grove/blob/main/docs/installation.md). +The [Grove Installation Guide](https://github.com/NVIDIA/grove/blob/main/docs/installation.md) provides detailed setup steps. -For practical examples of Grove-based multinode deployments in action, see the [Multinode Deployment Guide](/kubernetes-deployment/multinode/multinode-deployments), which demonstrates multi-node disaggregated serving scenarios. +Practical examples of Grove-based multinode deployments are available in the [Multinode Deployment Guide](/kubernetes-deployment/multinode/multinode-deployments), which demonstrates multi-node disaggregated serving scenarios.fern/fern/pages/kubernetes/deployment/multinode-deployment.mdx (1)
147-155: Minor inconsistency in YAML value formatting.At line 139,
nodeCount: 2is shown without quotes, but at line 154, the examples showmultinode.nodeCount: "2"with quotes. While both may work, the documentation should be consistent about whether these are integer or string values.fern/fern/pages/multimodal/sglang.mdx (1)
72-74: GitHub source links use blob paths instead of tree paths.The links to source files use
/tree/main/which is typically for directories. For individual files, GitHub expects/blob/main/. However, GitHub auto-redirects tree→blob for files, so this works but is not canonical.🔧 Optional: Use canonical blob paths for file links
- - [MultimodalEncodeWorkerHandler](https://github.com/ai-dynamo/dynamo/tree/main/components/src/dynamo/sglang/request_handlers/multimodal/encode_worker_handler.py) for encoding - - [MultimodalWorkerHandler](https://github.com/ai-dynamo/dynamo/tree/main/components/src/dynamo/sglang/request_handlers/multimodal/worker_handler.py) for prefilling and decoding. - - [MultimodalProcessorHandler](https://github.com/ai-dynamo/dynamo/tree/main/components/src/dynamo/sglang/request_handlers/multimodal/processor_handler.py) + - [MultimodalEncodeWorkerHandler](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/sglang/request_handlers/multimodal/encode_worker_handler.py) for encoding + - [MultimodalWorkerHandler](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/sglang/request_handlers/multimodal/worker_handler.py) for prefilling and decoding. + - [MultimodalProcessorHandler](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/sglang/request_handlers/multimodal/processor_handler.py)fern/fern/pages/api/nixl_connect/device_kind.mdx (1)
24-33: Consider addingReadableOperationto related classes.The Related Classes section lists
WritableOperationbut notReadableOperation. Based on the AI summary mentioning bothReadableOperationandWritableOperationas part of the NIXL Connect API, consider addingReadableOperationfor completeness.Suggested addition
- [RdmaMetadata](/additional-resources/api-reference/nixl-connect/rdma-metadata) - [ReadOperation](/additional-resources/api-reference/nixl-connect/read-operation) + - [ReadableOperation](/additional-resources/api-reference/nixl-connect/readable-operation) - [WritableOperation](/additional-resources/api-reference/nixl-connect/writable-operation) - [WriteOperation](/additional-resources/api-reference/nixl-connect/write-operation)fern/fern/pages/api/nixl_connect/connector.mdx (1)
13-13: Tighten the GPU Direct RDMA sentence for readability.✏️ Suggested rewording
-This class provides a "pythonic" interface using NIXL library to utilize GPU Direct RDMA accelerated, when available, data transfers between models hosted by different workers in a Dynamo graph. +This class provides a "pythonic" interface to the NIXL library for GPU Direct RDMA–accelerated data transfers (when available) between models hosted by different workers in a Dynamo graph.fern/fern/pages/kubernetes/installation_guide.mdx (1)
335-338: Minor: Capitalize "just" at start of sentence.-just add the following to the helm install command: +Just add the following to the helm install command:
902f206 to
a653d66
Compare
|
@grahamking I think I got the relative links working, so the docs will still work in VSCode/GitHub. Shouldn't be any breaking changes. |
|
@coderabbitai Review |
bc28d24 to
318e06a
Compare
|
/ok to test 2863687 |
Signed-off-by: Jont828 <jt572@cornell.edu>
Signed-off-by: Jont828 <jt572@cornell.edu>
Signed-off-by: Jont828 <jt572@cornell.edu>
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Signed-off-by: Jont828 <jt572@cornell.edu>
Signed-off-by: Jont828 <jt572@cornell.edu>
2863687 to
db98ba4
Compare
|
/ok to test db98ba4 |
|
This is great - my biggest questions are more operational:
|
|
@dagil-nvidia Great questions!
|
|
/ok to test 16dae4f |
Signed-off-by: Jont828 <jt572@cornell.edu> Signed-off-by: Neal Vaidya <nealv@nvidia.com> Co-authored-by: Neal Vaidya <nealv@nvidia.com>
Signed-off-by: Jont828 <jt572@cornell.edu> Signed-off-by: Neal Vaidya <nealv@nvidia.com> Co-authored-by: Neal Vaidya <nealv@nvidia.com>
Overview:
I'd like to migrate the docs to fern it can easily generate docs, provide versioned docs (which currently does not work on the site), and fixes the issues with relative/absolute link paths. This allows us to easily translate the MD docs into a website and removes the need for maintaining dedicated doc generation script with regex for replacing links and a complicated CI flow for deploying the docs as well.
These new docs are added under the
ferndirectory and exists in parallel to the existing Sphinx doc generation. Once the migration is complete, the contents of the fern/ folder will replace the docs/ folder and remove the Sphinx doc generation. This allows the new doc site to be deployed and tested without breaking any existing functionality.Replaces the docusaurus WIP PR in #5382 after discussing with maintainers.
The site is already published off of this PR, try it here!
This is an example of the resulting docs page.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.