Skip to content

docs: migrate existing docs to fern#5445

Merged
nealvaidya merged 8 commits intoai-dynamo:mainfrom
Jont828:fern-migration-parallel
Jan 26, 2026
Merged

docs: migrate existing docs to fern#5445
nealvaidya merged 8 commits intoai-dynamo:mainfrom
Jont828:fern-migration-parallel

Conversation

@Jont828
Copy link
Contributor

@Jont828 Jont828 commented Jan 15, 2026

Overview:

I'd like to migrate the docs to fern it can easily generate docs, provide versioned docs (which currently does not work on the site), and fixes the issues with relative/absolute link paths. This allows us to easily translate the MD docs into a website and removes the need for maintaining dedicated doc generation script with regex for replacing links and a complicated CI flow for deploying the docs as well.

These new docs are added under the fern directory and exists in parallel to the existing Sphinx doc generation. Once the migration is complete, the contents of the fern/ folder will replace the docs/ folder and remove the Sphinx doc generation. This allows the new doc site to be deployed and tested without breaking any existing functionality.

Replaces the docusaurus WIP PR in #5382 after discussing with maintainers.

The site is already published off of this PR, try it here!

This is an example of the resulting docs page.
image

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Documentation
    • Large documentation expansion: getting started, backend guides (vLLM, SGLang, TRT-LLM), Kubernetes/operator guides, observability, KV Block Manager, multimodal support, performance tuning, planner/SLA guides, APIs, examples and many design/developer reference pages.
  • Chores
    • Added project-level config and updated repository ignore rules to improve docs/site build and repo hygiene.

✏️ Tip: You can customize this high-level summary in your review settings.

@Jont828 Jont828 requested review from a team as code owners January 15, 2026 02:21
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

👋 Hi Jont828! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added docs external-contribution Pull request is from an external contributor labels Jan 15, 2026
@Jont828 Jont828 force-pushed the fern-migration-parallel branch from d1054e5 to 120fba5 Compare January 15, 2026 02:38
@Jont828 Jont828 force-pushed the fern-migration-parallel branch 2 times, most recently from 3fadd81 to a3d14ef Compare January 15, 2026 02:49
@nealvaidya
Copy link
Contributor

/ok to test a3d14ef

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 15, 2026

Walkthrough

Adds Fern site configuration and many new documentation files (MDX) across docs for backends, APIs (nixl_connect), Kubernetes, observability, benchmarking, architecture, multimodal, KVBM, planners, and developer guides; also updates .gitignore and a GitHub filter entry.

Changes

Cohort / File(s) Summary
Site config & repo metadata
\.github/filters.yaml, fern/fern/fern.config.json, fern/fern/docs.yml
Added GitHub filter for fern/**, project metadata (organization/version), and docs site config (instances, versions, navbar, branding, logo, favicon).
VCS ignore rules
fern/fern/.gitignore
Added ignore patterns (e.g., **/*.preview, **/*.definition) and an explicit negate for !*.svg.
Version placeholders
fern/fern/pages-v*/coming-soon.mdx (multiple)
Added "coming soon" pages for historical versions (v0.1.0–v0.7.1).
NIXL Connect API docs
fern/fern/pages/api/nixl_connect/* (README.mdx, connector.mdx, descriptor.mdx, device*.mdx, operation_*.mdx, rdma_metadata.mdx, operation_status.mdx)
New reference docs covering Connector, Descriptor, Device/DeviceKind, Read/Write operations, RdmaMetadata, and OperationStatus.
Agent/tool-calling docs
fern/fern/pages/agents/tool-calling.mdx
New doc explaining tool calling, parsers, mappings, and examples.
Backends — SGLang / TRTLLM / vLLM
fern/fern/pages/backends/sglang/*, fern/fern/pages/backends/trtllm/*, fern/fern/pages/backends/vllm/*
Large set of backend guides: quickstarts, disaggregation, profiling, Prometheus metrics, examples, multimode/feature-specific guides, and deployment notes.
Benchmarks & performance
fern/fern/pages/benchmarks/*, fern/fern/pages/performance/*
Added benchmarking guides (client/server, KV router A/B testing, SLA-driven profiling) and performance tuning docs.
Architecture & design
fern/fern/pages/design-docs/*, fern/fern/pages/fault-tolerance/*, fern/fern/pages/planner/*
Added architecture overviews, disaggregated serving, distributed runtime, main flows, request migration and cancellation, and planner (load/SLA) docs.
Kubernetes platform & operator
fern/fern/pages/kubernetes/*, docs/kubernetes/dynamo_operator.md
Extensive Kubernetes docs: installation, operator, CRD API reference (auto-generated), autoscaling, deployment guides, multinode, model caching, webhooks, FluxCD, observability for k8s. Minor markdown fence fix in docs/kubernetes/dynamo_operator.md.
KVBM (KV Block Manager)
fern/fern/pages/kvbm/*
New KVBM design, components, deep dive, integrations, setup guides (vllm/trtllm), motivation, and reading links.
Multimodal
fern/fern/pages/multimodal/*
Added multimodal support docs for vLLM/TRT-LLM/SGLang, encoding workflows, patterns, and examples.
Observability & monitoring
fern/fern/pages/observability/*
Added observability quickstart, metrics, Prometheus/Grafana guides, tracing, logging, health checks, and a metrics developer guide.
Developer & getting started
fern/fern/pages/development/*, fern/fern/pages/getting-started/*, fern/fern/pages/frontends/kserve.mdx
Added developer guides for creating workers, runtime guide, KServe frontend doc, intro, installation, quickstart, examples, and support matrix.
Guides & misc
fern/fern/pages/guides/*, fern/fern/pages/multimodal/*, other new pages
Added guides (JailedStream, request_plane), multimodal backend specifics, examples, and supporting documentation across many areas.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped in with a doc-shaped carrot,
Pages sprouted where silence sat,
Config stitched, examples scattered,
Backends, metrics—every hat!
A tiny rabbit cheers the vaulting stack.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'docs: migrate existing docs to fern' accurately and concisely summarizes the main objective of the changeset, clearly indicating the documentation framework migration.
Description check ✅ Passed The PR description includes an Overview section explaining the migration goals, a Details section describing the parallel approach, and Related Issues section, though it lacks a specific 'Where should the reviewer start?' section with file recommendations.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🤖 Fix all issues with AI agents
In `@fern/fern/fern.config.json`:
- Around line 1-4: Update the Fern CLI version in fern.config.json by replacing
the non-existent "version": "3.42.1" value with a valid released version (e.g.,
"3.29.1"); modify the "version" field in the JSON so the project uses a
published Fern CLI release to avoid installation/build failures.

In `@fern/fern/pages/agents/tool-calling.mdx`:
- Around line 55-72: The client example's base_url port mismatches the frontend
launch: update the OpenAI client base_url value (the base_url argument in the
example) to use port 8000 (http://localhost:8000/v1) to match the default
frontend started by python -m dynamo.frontend, or alternatively modify the
frontend launch command (python -m dynamo.frontend) to explicitly set
--http-port 8081 so it matches the current base_url; ensure you update either
the base_url in the example or add the --http-port flag to the python -m
dynamo.frontend command so both use the same port.

In `@fern/fern/pages/backends/sglang/sgl-hicache-example.mdx`:
- Around line 14-36: Update the example so the SGLang worker and frontend use
different ports: change the worker invocation flag "--port 8000" in the `python
-m dynamo.sglang` example to an unused port (e.g. "--port 8001") while leaving
`python -m dynamo.frontend --http-port 8000` unchanged; ensure both command
examples in the file reference the new worker port to avoid the port binding
conflict.

In `@fern/fern/pages/backends/vllm/multi-node.mdx`:
- Around line 84-88: The multi-line shell command is missing a trailing
backslash on the model line ("--model meta-llama/Llama-3.3-70B-Instruct"),
causing a shell syntax error; fix it by adding a trailing backslash to that line
so the command continuation lines ("--tensor-parallel-size 8 \" and
"--enforce-eager") are correctly joined into one multi-line command.
- Around line 93-97: The shell command snippet is missing a trailing backslash
on the line with the --tensor-parallel-size flag causing a syntax error; fix it
by adding a backslash at the end of the line containing "--tensor-parallel-size
8" so the command lines properly continue (keep the existing backslashes on
other lines like "--enforce-eager \" unchanged).
- Around line 78-98: The disaggregated example has swapped comments and flags
for Node 1 and Node 2: update the Node 1 block (the "Node 1" header and the
python -m dynamo.vllm invocation) to label it "Run ingress and decode worker"
and change the inline comment to "Start decode worker" (keeping the dynamo.vllm
command without --is-prefill-worker), and update the Node 2 block (the second
python -m dynamo.vllm invocation) to label it "Run prefill worker" and change
its inline comment to "Start prefill worker" while retaining the
--is-prefill-worker flag; ensure the python -m dynamo.frontend line remains as
the ingress start and that the flag --is-prefill-worker appears only in the
prefill worker command.

In `@fern/fern/pages/design-docs/distributed_runtime.mdx`:
- Around line 34-37: The admonition block beginning with ":::caution" is closed
incorrectly with triple backticks; locate the admonition start (:::caution) and
replace the closing backticks with the matching closing marker ":::", ensuring
the block is opened with ":::caution" and closed with ":::".

In `@fern/fern/pages/frontends/kserve.mdx`:
- Line 97: The doc contains GitHub links pointing to tree/main which will break
versioned docs; update the two links referencing lib/llm/src/protocols/tensor.rs
and the two links referencing lib/bindings/python/tests/test_tensor.py (as seen
around the TensorModelConfig paragraph in kserve.mdx) to use versioned
references (a specific tag like vX.Y.Z, a commit SHA, or relative repository
paths) instead of tree/main so they resolve to the correct code for each
published doc version; ensure all four occurrences on and around the
TensorModelConfig paragraph are replaced consistently.

In `@fern/fern/pages/getting-started/intro.mdx`:
- Around line 77-94: The three documentation links under "Architecture" are
pointing to the wrong directory and one has the wrong filename: update the
System Architecture link (`./design_docs/architecture`) to
`./design-docs/architecture`, update the Disaggregated Serving link
(`./design_docs/disagg_serving`) to `./design-docs/disagg-serving`, and update
the Distributed Runtime link (`./design_docs/distributed_runtime`) to
`./design-docs/distributed_runtime` so the directory uses the hyphenated name
`design-docs` and the disaggregated file uses the hyphenated filename
`disagg-serving`.

In `@fern/fern/pages/getting-started/quickstart.mdx`:
- Around line 76-100: The three Markdown links in the Architecture section use
the wrong directory and file name separators; update the link targets to the
correct paths: change `./design_docs/architecture` to
`./design-docs/architecture`, change `./design_docs/disagg_serving` to
`./design-docs/disagg-serving`, and change `./design_docs/distributed_runtime`
to `./design-docs/distributed_runtime` so they point to the existing
`design-docs` directory and the hyphenated `disagg-serving` file.
🟡 Minor comments (64)
fern/fern/pages/kubernetes/deployment/minikube.mdx-26-34 (1)

26-34: Provide separate commands for GPU and non-GPU setups.

The command includes --gpus all unconditionally, but the comment says "if configured". Users without GPUs will encounter an error when running this command. Consider providing two separate commands to avoid confusion.

Suggested fix
 ```bash
-# Start Minikube with GPU support (if configured)
-minikube start --driver docker --container-runtime docker --gpus all --memory=16000mb --cpus=8
+# Start Minikube without GPU support
+minikube start --driver docker --container-runtime docker --memory=16000mb --cpus=8
+
+# Or, start Minikube with GPU support (if configured in step 2)
+# minikube start --driver docker --container-runtime docker --gpus all --memory=16000mb --cpus=8
 
 # Enable required addons
 minikube addons enable istio-provisioner
fern/fern/pages/guides/request_plane.mdx-163-167 (1)

163-167: Fix grammar issues in NATS usage section.

Two minor issues on line 165:

  1. "KV based routing" should be hyphenated as "KV-based routing"
  2. Subject-verb agreement: "routing require" should be "routing requires"
📝 Suggested fix
 **When to use NATS:**
 - Production deployments with service discovery
-- Currently KV based routing require NATS. If you want to completely disable NATS, KV based routing won't be available
+- Currently KV-based routing requires NATS. If you want to completely disable NATS, KV-based routing won't be available
 - Need for message replay and persistence features
fern/fern/pages/guides/jail_stream_readme.mdx-26-27 (1)

26-27: Correct the example file path — jail_example.rs does not exist in the codebase.

The documentation references lib/llm/src/protocols/openai/chat_completions/jail_example.rs for examples, but this file does not exist. The main implementation file at lib/llm/src/protocols/openai/chat_completions/jail.rs exists and is correct. Update the examples path to point to the actual location where examples or usage are documented (possibly lib/llm/tests/test_jail.rs or another file).

fern/fern/pages/frontends/kserve.mdx-12-12 (1)

12-12: Fix compound adjective hyphenation in multiple locations.

Several compound adjectives should be hyphenated per standard English grammar: "industry-standard", "tensor-based", "KServe-based", and "client-side".

📝 Proposed fixes for hyphenation

Line 12:

-[KServe v2 API](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2) is one of the industry standard protocol for machine learning model inference.
+[KServe v2 API](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2) is one of the industry-standard protocols for machine learning model inference.

Line 35:

-* `ModelType::TensorBased` and `ModelInput::Tensor`: Combination for backend that is used for generic tensor based inference
+* `ModelType::TensorBased` and `ModelInput::Tensor`: Combination for backend that is used for generic tensor-based inference

Line 41:

-Most of the Dynamo features are tailored for LLM inference and the combinations that are backed by OpenAI API can enable those features and are best suited for exploring those Dynamo features. However, this implies specific conversion between generic tensor based messages and OpenAI message and imposes specific structure of the KServe request message.
+Most of the Dynamo features are tailored for LLM inference and the combinations that are backed by OpenAI API can enable those features and are best suited for exploring those Dynamo features. However, this implies specific conversion between generic tensor-based messages and OpenAI message and imposes specific structure of the KServe request message.

Line 92:

-This combination is used when the user is migrating an existing KServe based backend into Dynamo ecosystem.
+This combination is used when the user is migrating an existing KServe-based backend into Dynamo ecosystem.

Line 96:

-When registering the backend, the backend must provide the model's metadata as tensor based deployment is generic and the frontend can't make any assumptions like for OpenAI Completions model.
+When registering the backend, the backend must provide the model's metadata as tensor-based deployment is generic and the frontend can't make any assumptions like for OpenAI Completions model.

Line 98:

-* [triton_model_config](https://github.com/ai-dynamo/dynamo/tree/main/lib/llm/src/protocols/tensor.rs): For users that already have Triton model config and require the full config to be returned for client side logic, they can set the config in `TensorModelConfig::triton_model_config` which will supersedes other fields in `TensorModelConfig` and be used for endpoint responses.
+* [triton_model_config](https://github.com/ai-dynamo/dynamo/tree/main/lib/llm/src/protocols/tensor.rs): For users that already have Triton model config and require the full config to be returned for client-side logic, they can set the config in `TensorModelConfig::triton_model_config` which will supersedes other fields in `TensorModelConfig` and be used for endpoint responses.

Also applies to: 35-35, 41-41, 92-92, 96-96, 98-98

fern/fern/pages/backends/vllm/deepseek-r1.mdx-10-10 (1)

10-10: Typo: "seperate" → "separate".

📝 Proposed fix
-Dynamo supports running Deepseek R1 with data parallel attention and wide expert parallelism. Each data parallel attention rank is a seperate dynamo component that will emit its own KV Events and Metrics. vLLM controls the expert parallelism using the flag `--enable-expert-parallel`
+Dynamo supports running Deepseek R1 with data parallel attention and wide expert parallelism. Each data parallel attention rank is a separate dynamo component that will emit its own KV Events and Metrics. vLLM controls the expert parallelism using the flag `--enable-expert-parallel`
fern/fern/pages/kubernetes/dynamo_operator.mdx-84-107 (1)

84-107: Unclosed code block causes rendering issues.

The bash code block starting at line 84 is missing a closing fence before the "Observability" section at line 97. This will cause the observability heading and subsequent content to render incorrectly (likely as part of the code block or with broken formatting).

📝 Proposed fix
   --set dynamo-operator.controllerManager.manager.image.tag=v2.0.0-beta
+```
 
 **Observability:**
fern/fern/pages/getting-started/support-matrix.mdx-72-74 (1)

72-74: Outdated release date needs updating.

The callout states v0.8.0 is "planned for January 14, 2025", but that date has passed. Update to reflect current status (either released or the actual planned date).

Suggested fix
 <Callout intent="info">
-**main (ToT)** reflects the current development branch. **v0.8.0** is the upcoming release (planned for January 14, 2025) and not yet available.
+**main (ToT)** reflects the current development branch. **v0.8.0** is the upcoming release and not yet available.
 </Callout>
fern/fern/pages/getting-started/support-matrix.mdx-14-17 (1)

14-17: Clarify ARM64 wheel availability.

The table indicates ARM64 CPU architecture is "Supported", but based on learnings, the project does not ship ARM64 wheels. Consider clarifying that ARM64 is supported via Docker images only, not pip wheels, to avoid confusion.

fern/fern/pages/getting-started/examples.mdx-55-59 (1)

55-59: Internal link paths require correction.

The relative paths in the "Next Steps" section are incorrect. The directories backends, kubernetes, and agents are siblings of getting-started at the pages root level, not child directories. The current paths ./backends/vllm/README, ./kubernetes/README, and ./agents/tool-calling will fail to resolve. Use ../ to navigate up to the pages level first.

Suggested fix
 ## Next Steps
 
-- See the [Backends documentation](./backends/vllm/README) for detailed backend configuration
-- Check [Kubernetes Deployment](./kubernetes/README) for production deployments
-- Review [User Guides](./agents/tool-calling) for advanced features
+- See the [Backends documentation](../backends/vllm/README) for detailed backend configuration
+- Check [Kubernetes Deployment](../kubernetes/README) for production deployments
+- Review [User Guides](../agents/tool-calling) for advanced features
fern/fern/pages-v0.6.0/coming-soon.mdx-7-11 (1)

7-11: Terminology inconsistency: "Latest" vs "Next".

The page refers users to the "Latest version", but in docs.yml the current/development version is labeled "Next" (display-name). Consider aligning the terminology to avoid confusion.

📝 Suggested fix
 <Callout intent="info">
 Documentation for this version is coming soon.
 </Callout>

-This version's documentation is being migrated. Please check back later or use the **Latest** version for the most up-to-date documentation.
+This version's documentation is being migrated. Please check back later or use the **Next** version for the most up-to-date documentation.

Alternatively, if "Latest" is the intended user-facing term, update the display-name in docs.yml.

fern/fern/pages/kvbm/kvbm_motivation.mdx-12-17 (1)

12-17: Grammar and clarity issue in bullet point.

Line 15 has awkward phrasing: "Modular and need simplified UX and to be memory safe" doesn't read clearly. Consider revising for clarity.

📝 Suggested fix
 * Tailored for GenAI use-cases
 * Lack of visibility into real-time block usage patterns.
 * Need for lightweight, ownership-driven memory management over complex object stores with unneeded overheads.
-* Modular and need simplified UX and to be memory safe.
+* Need for modular, memory-safe design with simplified UX.
 * Inability to differentiate between hot (frequently accessed) and cold (infrequently accessed) blocks across the stack without intrusive application-level changes.
 * Difficulty in optimizing storage placement across heterogeneous storage tiers (for example, SSDs, object storage, and cloud storage).
fern/fern/pages/kubernetes/fluxcd.mdx-28-28 (1)

28-28: Fix grammatical issue in sentence.

The sentence has awkward phrasing: "First, follow to [See Install..." should likely be "First, see Install Dynamo Kubernetes Platform." or similar.

Suggested fix
-First, follow to [See Install Dynamo Kubernetes Platform](./installation_guide).
+First, see [Install Dynamo Kubernetes Platform](./installation_guide).
fern/fern/pages/kubernetes/fluxcd.mdx-69-69 (1)

69-69: Terminology: "CRD" should be "CR".

A CRD (Custom Resource Definition) defines the schema; a CR (Custom Resource) is an instance of that schema. When updating a deployment, you update the CR (DynamoGraphDeployment instance), not the CRD.

Suggested fix
-To update your pipeline, just update the associated DynamoGraphDeployment CRD
+To update your pipeline, just update the associated DynamoGraphDeployment CR
fern/fern/pages/kubernetes/installation_guide.mdx-333-338 (1)

333-338: Capitalize sentence beginning.

The sentence starting with "just add" should begin with a capital letter for proper grammar.

Proposed fix
-just add the following to the helm install command:
+Just add the following to the helm install command:
fern/fern/pages/kubernetes/deployment/create_deployment.mdx-157-230 (1)

157-230: Step numbering is inconsistent - jumps from Step 3 to Step 6.

The document has Steps 1, 2, and 3, but then jumps directly to Step 6 at line 230. Steps 4 and 5 are missing, which will confuse readers following the guide sequentially.

Proposed fix
-## Step 6: Deploy LoRA Adapters (Optional)
+## Step 4: Deploy LoRA Adapters (Optional)

Alternatively, add the missing Steps 4 and 5 if there was intended content for them.

fern/fern/pages/backends/trtllm/gpt-oss.mdx-216-216 (1)

216-216: Typo: "ususally" should be "usually".

Suggested fix
-is that the application has a set of tools to aid the assistant provide accurate answer, and it is ususally
+is that the application has a set of tools to aid the assistant provide accurate answer, and it is usually
fern/fern/pages/multimodal/vllm.mdx-166-166 (1)

166-166: GitHub-style alert syntax may not render in fern MDX.

The > [!NOTE] syntax is GitHub Flavored Markdown and may not render correctly in fern's MDX environment. Consider using fern's Callout component for consistency with other callouts in this file (like lines 12-16).

Suggested fix
-> [!NOTE] Disaggregation is currently only confirmed to work with LLaVA. Qwen2.5-VL is not confirmed to be supported.
+<Callout intent="info">
+Disaggregation is currently only confirmed to work with LLaVA. Qwen2.5-VL is not confirmed to be supported.
+</Callout>
fern/fern/pages/backends/trtllm/gpt-oss.mdx-163-174 (1)

163-174: Decode worker command missing --max-batch-size parameter.

Line 122 documents that decode-specific arguments include --max-batch-size 128, but the manual launch command for the decode worker (lines 163-174) omits this parameter while the prefill worker includes its --max-batch-size 32.

Suggested fix
 CUDA_VISIBLE_DEVICES=4,5,6,7 python3 -m dynamo.trtllm \
   --model-path /model \
   --served-model-name openai/gpt-oss-120b \
   --extra-engine-args examples/backends/trtllm/engine_configs/gpt-oss-120b/decode.yaml \
   --dyn-reasoning-parser gpt_oss \
   --dyn-tool-call-parser harmony \
   --disaggregation-mode decode \
   --max-num-tokens 16384 \
+  --max-batch-size 128 \
   --free-gpu-memory-fraction 0.9 \
   --tensor-parallel-size 4 \
   --expert-parallel-size 4
fern/fern/pages/kubernetes/quickstart.mdx-176-198 (1)

176-198: Inconsistent dynamoNamespace values in example YAML.

The example shows dynamoNamespace: my-llm for Frontend (line 178) and dynamoNamespace: dynamo-dev for VllmDecodeWorker (line 185). While the doc mentions these namespaces are independent (line 23), having different values in the same deployment example may confuse users following this as a template. Consider using consistent values in this introductory example.

Suggested fix
     Frontend:
-      dynamoNamespace: my-llm
+      dynamoNamespace: dynamo-dev
       componentType: frontend
fern/fern/pages/multimodal/index.mdx-18-21 (1)

18-21: Empty "Backend Documentation" section.

The section header exists but contains no content (only blank lines before the Support Matrix). Either add the intended content or remove this section header.

Suggested fix (if removing)
-## Backend Documentation
-
-
-
 ## Support Matrix
fern/fern/pages/backends/trtllm/gpt-oss.mdx-176-176 (1)

176-176: Section numbering skips from 4 to 6.

The guide jumps from "4. Launch the Deployment" directly to "6. Verify the Deployment is Ready", skipping section 5.

Suggested fix
-### 6. Verify the Deployment is Ready
+### 5. Verify the Deployment is Ready

And update subsequent sections accordingly (6→5, 7→6, 8→7).

fern/fern/pages/fault-tolerance/request_migration.mdx-47-47 (1)

47-47: Grammar error: "This creates accumulates" is incorrect.

The sentence appears to have a word missing or incorrect construction.

Suggested fix
-2. **Response Tracking**: As each response arrives from the worker, the migration system extracts the newly generated tokens and appends them to the request's token sequence. This creates accumulates all tokens that have been generated.
+2. **Response Tracking**: As each response arrives from the worker, the migration system extracts the newly generated tokens and appends them to the request's token sequence. This accumulates all tokens that have been generated.
fern/fern/pages/development/backend-guide.mdx-104-104 (1)

104-104: Typo: "generat" should be "generate".

There's a typo in the example that should be corrected to avoid confusion.

Suggested fix
-Node 2: namespace: llama3-1-8b, component: backend, endpoint: generat, model: /data/Llama-3.1-8B-Instruct/
+Node 2: namespace: llama3-1-8b, component: backend, endpoint: generate, model: /data/Llama-3.1-8B-Instruct/
fern/fern/pages/kubernetes/api_reference.mdx-17-23 (1)

17-23: Remove duplicate package description in auto-generated documentation.

Lines 17 and 22 contain identical package descriptions. The duplicate originates from two Go source files both providing the same package-level documentation:

  • deploy/operator/api/v1alpha1/groupversion_info.go
  • deploy/operator/api/v1alpha1/dynamographdeploymentrequest_types.go

Remove the generic package description from dynamographdeploymentrequest_types.go and retain only the DynamoGraphDeploymentRequest-specific context, keeping the description unique to that file's purpose.

fern/fern/pages/multimodal/trtllm.mdx-259-259 (1)

259-259: Potential broken link after migration.

The comment references docs/backends/trtllm/README.md#build-container, but since this PR migrates documentation to fern/, this path may become invalid after the Sphinx docs are removed. Consider updating to reference the Fern documentation path or a stable external URL.

fern/fern/pages/agents/tool-calling.mdx-50-50 (1)

50-50: Trailing comma in Jamba models list.

The Jamba parser row ends with a trailing comma after AI21-Jamba-*-1.7, which appears unintentional.

✏️ Suggested fix
-| jamba |  ai21labs/AI21-Jamba-*-1.5, ai21labs/AI21-Jamba-*-1.6, ai21labs/AI21-Jamba-*-1.7, |
+| jamba |  ai21labs/AI21-Jamba-*-1.5, ai21labs/AI21-Jamba-*-1.6, ai21labs/AI21-Jamba-*-1.7 |
fern/fern/pages/benchmarks/kv-router-ab-testing.mdx-104-104 (1)

104-104: External link may become stale after migration.

The link to docs/kubernetes/installation_guide.md in the main branch may break if the Sphinx docs are removed during migration. Consider updating to reference the new Fern documentation path once the migration is complete.

fern/fern/pages/backends/vllm/README.mdx-59-59 (1)

59-59: Typographical error: Extra word "our".

The phrase "all of our the common" contains a typo.

📝 Suggested fix
-Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
+Below we provide a guide that lets you run all the common deployment patterns on a single node.
fern/fern/pages/backends/vllm/README.mdx-167-167 (1)

167-167: Update vLLM documentation link to use /latest/ for automatic version tracking.

The linked documentation for vLLM v0.9.2 is significantly outdated. The current latest version is v0.13.0 (released December 2025). Consider updating the URL to use the /en/latest/ path instead to ensure the documentation reference stays current as vLLM continues to release updates frequently.

fern/fern/pages/kvbm/kvbm_components.mdx-39-42 (1)

39-42: Fix grammatical error in data flow description.

Line 40 has awkward phrasing that makes the sentence incomplete.

📝 Suggested fix
 **Device → Host (Offload)**
-* Triggered explicitly requested to offload by the connector scheduler.
+* Triggered when explicitly requested to offload by the connector scheduler.
 * Worker allocates a Host block and performs CUDA D2H/Custom Kernel copy.
fern/fern/pages/backends/sglang/gpt-oss.mdx-10-11 (1)

10-11: Fix typo: "ues" → "use".

📝 Suggested fix
 The gpt-oss-120b guide for SGLang is largely identical to the [guide for vLLM](/additional-resources/backend-details/v-llm/gpt-oss),
-please ues the vLLM guide as a reference with the different deployment steps as highlighted below:
+please use the vLLM guide as a reference with the different deployment steps as highlighted below:
fern/fern/pages/backends/trtllm/llama4_plus_eagle.mdx-25-28 (1)

25-28: Incomplete sentence in setup instructions.

Line 27 ends with "based:" which appears incomplete. Consider completing the sentence for clarity.

Suggested fix
 Assuming you have already allocated your nodes via `salloc`, and are
 inside an interactive shell on one of the allocated nodes, set the
-following environment variables based:
+following environment variables based on your setup:
fern/fern/pages/api/nixl_connect/rdma_metadata.mdx-21-26 (1)

21-26: Incorrect link for WritableOperation.

Line 24 links WritableOperation to write_operation instead of writable_operation. The pairing documentation should link each class to its own documentation page.

Proposed fix
 <Callout intent="success">
 Classes using `RdmaMetadata` objects must be paired correctly.
 [`ReadableOperation`](readable_operation) with [`ReadOperation`](read_operation), and
-[`WritableOperation`](write_operation) with [`WriteOperation`](write_operation).
+[`WritableOperation`](writable_operation) with [`WriteOperation`](write_operation).
 Incorrect pairing will result in an error being raised.
 </Callout>
fern/fern/pages/benchmarks/benchmarking.mdx-485-489 (1)

485-489: Fix numbered list - missing item 2.

The troubleshooting list skips from item 1 to item 3. Either add the missing item or renumber the list sequentially.

Proposed fix
 1. **Service not found**: Ensure your DynamoGraphDeployment frontend service is running
-3. **PVC access**: Check that `dynamo-pvc` is properly configured and accessible
-4. **Image pull issues**: Ensure the Docker image is accessible from the cluster
-5. **Resource constraints**: Adjust resource limits if the job is being evicted
+2. **PVC access**: Check that `dynamo-pvc` is properly configured and accessible
+3. **Image pull issues**: Ensure the Docker image is accessible from the cluster
+4. **Resource constraints**: Adjust resource limits if the job is being evicted
fern/fern/pages/observability/README.mdx-37-37 (1)

37-37: Minor grammatical fix needed.

"Documentations" is non-standard; use "Documentation" instead.

Suggested fix
-## Observability Documentations
+## Observability Documentation
fern/fern/pages/observability/health-checks.mdx-60-62 (1)

60-62: Port inconsistency in example.

The quickstart section (line 34) states the frontend default port is 8000, but this example uses port 8080. This inconsistency could confuse users.

Suggested fix
-curl -s localhost:8080/live -q | jq
+curl -s localhost:8000/live -q | jq
fern/fern/pages/observability/health-checks.mdx-79-85 (1)

79-85: Copy-paste error and port inconsistency.

  1. Line 79: The note incorrectly says "Frontend liveness" but this section is about "Frontend Health Check"
  2. Line 84: Uses port 8080, but should be 8000 to match the documented default
Suggested fix
-> **Note**: Frontend liveness doesn't depend on worker health or liveness only on the Frontend service itself.
+> **Note**: Frontend health doesn't depend on worker health or liveness only on the Frontend service itself.
 
 ### Example Request
 

-curl -v localhost:8080/health -q | jq
+curl -v localhost:8000/health -q | jq

fern/fern/pages/observability/README.mdx-56-56 (1)

56-56: Malformed link text.

The link text do../kubernetes/observability/metrics.md appears to be a typo or incomplete path. This should be corrected to display meaningful text.

Suggested fix
-For Kubernetes-specific setup and configuration, see [do../kubernetes/observability/metrics.md](../kubernetes/observability/metrics).
+For Kubernetes-specific setup and configuration, see [Kubernetes Observability Metrics](../kubernetes/observability/metrics).
fern/fern/pages/api/nixl_connect/descriptor.mdx-10-11 (1)

10-11: Typo: "NIXL-base" should be "NIXL-based".

Line 10 has a typo that should be corrected for consistency with other documentation.

Proposed fix
-Memory descriptor that ensures memory is registered with the NIXL-base I/O subsystem.
+Memory descriptor that ensures memory is registered with the NIXL-based I/O subsystem.
fern/fern/pages/api/nixl_connect/descriptor.mdx-37-39 (1)

37-39: Minor grammar fix: missing pronoun "it".

Proposed fix
-When the descriptor is assigned to a NIXL operation, it will be automatically registered if was not explicitly registered.
+When the descriptor is assigned to a NIXL operation, it will be automatically registered if it was not explicitly registered.
fern/fern/pages/api/nixl_connect/descriptor.mdx-21-21 (1)

21-21: Minor grammar fix: use hyphen in compound adjective.

"CPU addressable" should be hyphenated when used as a compound adjective before a noun.

Proposed fix
- 3. From a Python `bytes` object. Memory is assumed to reside in CPU addressable host memory.
+ 3. From a Python `bytes` object. Memory is assumed to reside in CPU-addressable host memory.
fern/fern/pages/benchmarks/sla_driven_profiling.mdx-256-256 (1)

256-256: Typo: "interplation" should be "interpolation".

Proposed fix
-- `selected_decode_interpolation/decode_itl_interplation.png`: ITL vs KV usage and context length for the recommended decode engine
+- `selected_decode_interpolation/decode_itl_interpolation.png`: ITL vs KV usage and context length for the recommended decode engine
fern/fern/pages/backends/vllm/gpt-oss.mdx-115-116 (1)

115-116: Typo: "ususally" should be "usually".

Proposed fix
-is that the application has a set of tools to aid the assistant provide accurate answer, and it is ususally
-multi-turn as it involves tool selection and generation based on the tool result. Below is an example
+is that the application has a set of tools to aid the assistant provide accurate answers, and it is usually
+multi-turn as it involves tool selection and generation based on the tool result. Below is an example
fern/fern/pages/backends/vllm/speculative_decoding.mdx-88-105 (1)

88-105: Example output format doesn't match chat completions API response.

The curl request targets /v1/chat/completions, but the example response uses the completions format with a "text" field. The chat completions endpoint returns a "message" object instead:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": "..."
      },
      "index": 0,
      "finish_reason": "stop"
    }
  ]
}

This may confuse users trying to parse the response programmatically.

📝 Suggested fix
 {
   "id": "cmpl-3e87ea5c-010e-4dd2-bcc4-3298ebd845a8",
   "choices": [
     {
-      "text": "In cherry blossom's gentle breeze ... A delicate balance of life and death, as petals fade, and new life breathes.",
+      "message": {
+        "role": "assistant",
+        "content": "In cherry blossom's gentle breeze ... A delicate balance of life and death, as petals fade, and new life breathes."
+      },
       "index": 0,
       "finish_reason": "stop"
     }
   ],
fern/fern/pages/performance/tuning.mdx-38-41 (1)

38-41: Typo: missing word "leads".

The sentence is missing a word, making it grammatically incorrect.

 <Callout intent="info">
-for decode-only engines, sometimes larger number of GPUs has to larger KV cache per GPU and more decoding requests running in parallel, which leads to both better throughput/GPU and better latency/user.
+For decode-only engines, sometimes a larger number of GPUs leads to larger KV cache per GPU and more decoding requests running in parallel, which leads to both better throughput/GPU and better latency/user.
 For example, for Llama-3.3-70b NVFP4 quantization on B200 in vLLM with 0.9 free GPU memory fraction:
 </Callout>

Also note: sentence should start with capital "F" and include article "a" before "larger number".

fern/fern/pages/kvbm/trtllm-setup.mdx-129-133 (1)

129-133: Inconsistent metric naming/description for h2d suffix.

The h2d suffix is used inconsistently:

  • Line 130: kvbm_offload_blocks_h2d described as "host to disk"
  • Line 133: kvbm_onboard_blocks_h2d described as "host to device"

The standard convention is h2d = "host to device". Please verify the correct metric names and descriptions. If line 130 truly means "host to disk", consider renaming the metric to something like h2disk or h2d_disk for clarity.

fern/fern/pages/api/nixl_connect/writable_operation.mdx-36-37 (1)

36-37: Minor grammatical issue in code comment.

📝 Suggested fix
-        # Wait the remote worker to complete its write operation to local_tensor.
+        # Wait for the remote worker to complete its write operation to local_tensor.
fern/fern/pages/backends/trtllm/multinode/multinode-examples.mdx-79-83 (1)

79-83: Typo: "iamge" should be "image".

Line 81 has a typo in the comment.

Fix
 ```bash
 # NOTE: IMAGE must be set manually for now
-# To build an iamge, see the steps here:
+# To build an image, see the steps here:
 # https://github.com/ai-dynamo/dynamo/tree/main/docs/backends/trtllm/README.md#build-container
 export IMAGE="<dynamo_trtllm_image>"
fern/fern/pages/design-docs/architecture.mdx-28-28 (1)

28-28: Grammar: subject-verb agreement.

"A disaggregated approach that separate" should be "separates" (singular verb to match "approach").

Fix
-- *GPU underutilization*: Traditional monolithic inference pipelines often leave GPUs idle due to the imbalance between prefill and decode stages. Prefill (which generates large prompt embeddings) is highly compute-intensive, while decode (which generates tokens) is latency-sensitive. A disaggregated approach that separate prefill and decode ensures optimal GPU utilization and increases overall throughput ([DistServe](https://arxiv.org/abs/2401.09670)).
+- *GPU underutilization*: Traditional monolithic inference pipelines often leave GPUs idle due to the imbalance between prefill and decode stages. Prefill (which generates large prompt embeddings) is highly compute-intensive, while decode (which generates tokens) is latency-sensitive. A disaggregated approach that separates prefill and decode ensures optimal GPU utilization and increases overall throughput ([DistServe](https://arxiv.org/abs/2401.09670)).
fern/fern/pages/design-docs/architecture.mdx-80-80 (1)

80-80: Typo: "preceeding" should be "preceding".

Also, "KV aware routing" should be hyphenated as "KV-aware routing" for consistency with compound adjective usage elsewhere in the document.

Fix
-Existing routing methods, including load-based routing, overlook the specific properties of LLMs that could improve performance. Addressing this, routing user queries to workers with the highest KV cache hit rate (rather than simply the least busy node) allows for immediate processing, even under heavy load. The preceeding figures illustrate the effectiveness of KV aware routing on 100,000 real R1 user queries, achieving a 3x improvement in TTFT and a 2x reduction in average request latency. Depending on traffic, this approach can also enhance throughput.
+Existing routing methods, including load-based routing, overlook the specific properties of LLMs that could improve performance. Addressing this, routing user queries to workers with the highest KV cache hit rate (rather than simply the least busy node) allows for immediate processing, even under heavy load. The preceding figures illustrate the effectiveness of KV-aware routing on 100,000 real R1 user queries, achieving a 3x improvement in TTFT and a 2x reduction in average request latency. Depending on traffic, this approach can also enhance throughput.
fern/fern/pages/backends/trtllm/multinode/multinode-examples.mdx-210-223 (1)

210-223: Typo: "succesfully" should be "successfully".

Line 213 has a spelling error.

Fix
    You can see each rank's output prefixed with the rank at the start of each log line
-   until the model succesfully finishes loading:
+   until the model successfully finishes loading:
fern/fern/pages/kubernetes/README.mdx-87-108 (1)

87-108: Minor: Redundant namespace creation.

Line 91 creates the namespace with kubectl create namespace ${NAMESPACE}, but the platform installation step (line 65) already uses --create-namespace. If users follow both sections sequentially with the same namespace, the explicit kubectl create namespace will fail with "already exists" error.

Consider either removing line 91 or adding a note that this step is only needed if deploying to a different namespace than the platform.

Suggested clarification
 ## 3. Deploy Your First Model

 ```bash
-export NAMESPACE=dynamo-system
-kubectl create namespace ${NAMESPACE}
+# Use same namespace as platform, or create a new one for model isolation
+export NAMESPACE=dynamo-system  # or your preferred namespace
+# kubectl create namespace ${NAMESPACE}  # Only if using a different namespace

 # to pull model from HF
fern/fern/pages/kubernetes/README.mdx-177-198 (1)

177-198: Inconsistent dynamoNamespace values in example.

The example shows Frontend using dynamoNamespace: my-llm (line 178) while VllmDecodeWorker uses dynamoNamespace: dynamo-dev (line 185). Based on the terminology section (lines 18-22), components within the same deployment typically share a Dynamo namespace for service discovery.

Consider using the same dynamoNamespace value for both services to avoid confusion, or add a comment explaining when different namespaces would be appropriate.

Suggested fix
     Frontend:
       dynamoNamespace: my-llm
       componentType: frontend
       replicas: 1
       extraPodSpec:
         mainContainer:
           image: your-image
     VllmDecodeWorker:  # or SGLangDecodeWorker, TrtllmDecodeWorker
-      dynamoNamespace: dynamo-dev
+      dynamoNamespace: my-llm  # Should match Frontend for service discovery
       componentType: worker
fern/fern/pages/design-docs/architecture.mdx-44-48 (1)

44-48: Fix incorrect link reference: use hyphen instead of underscore.

The link disagg_serving does not match the actual file disagg-serving.mdx. Change line 44 to:

- [Dynamo Disaggregated Serving](disagg-serving)
fern/fern/pages/kvbm/kvbm_design_deepdive.mdx-226-226 (1)

226-226: Minor grammar fix: hyphenate "high-level".

"High level" should be hyphenated when used as a compound adjective before a noun.

Suggested fix
-Now, to enable fast lookup and dynamic tiering, storage vendors may build internal data structures using the received event stream. Here is a high level conceptual design:
+Now, to enable fast lookup and dynamic tiering, storage vendors may build internal data structures using the received event stream. Here is a high-level conceptual design:
fern/fern/pages/api/nixl_connect/README.mdx-46-48 (1)

46-48: Fix grammatical error in the description.

Line 48 is missing a word. "registered by a remote worker to writable" should be "registered by a remote worker to be writable" or "registered by a remote worker as writable."

Suggested fix
  4. **Write to registered, remote memory**:

-    Write local memory buffer(s) to remote memory buffer(s) registered by a remote worker to writable.
+    Write local memory buffer(s) to remote memory buffer(s) registered by a remote worker as writable.
fern/fern/pages/kvbm/kvbm_design_deepdive.mdx-28-31 (1)

28-31: Fix typo: missing space in "BlockLayouttrait".

Line 30 has a typo where "BlockLayout" and "trait" are concatenated without a space.

Suggested fix
-Each block is a 2D array `[num_layers][page_size × inner_dim]`. `BlockLayouttrait` abstracts the memory layout. The default implementation,`FullyContiguous`, stores all layers for all blocks in one region with alignment-aware stride computation:
+Each block is a 2D array `[num_layers][page_size × inner_dim]`. The `BlockLayout` trait abstracts the memory layout. The default implementation, `FullyContiguous`, stores all layers for all blocks in one region with alignment-aware stride computation:
fern/fern/pages/backends/trtllm/README.mdx-59-62 (1)

59-62: Grammar: extra word "the".

Line 61: "all of our the common" should be "all of the common" or "all our common".

🔤 Proposed fix
-Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
+Below we provide a guide that lets you run all the common deployment patterns on a single node.
fern/fern/pages/design-docs/disagg-serving.mdx-78-81 (1)

78-81: Typo: "comptued" should be "computed".

Line 80 contains a typo in the mermaid diagram label.

🔤 Proposed fix
-    P-->>D: Remote NIXL write for comptued KV blocks (non-block)
+    P-->>D: Remote NIXL write for computed KV blocks (non-block)
fern/fern/pages/planner/sla_planner_quickstart.mdx-471-485 (1)

471-485: Minor: Missing period after "etc".

In American English style, "etc" should have a period.

🔤 Proposed fix
-By default, profiling jobs save essential data to ConfigMaps for planner integration. For advanced users who need access to detailed artifacts (logs, performance plots, AIPerf results, etc), configure the DGDR to use `dynamo-pvc`.
+By default, profiling jobs save essential data to ConfigMaps for planner integration. For advanced users who need access to detailed artifacts (logs, performance plots, AIPerf results, etc.), configure the DGDR to use `dynamo-pvc`.
fern/fern/pages/kvbm/kvbm_architecture.mdx-17-18 (1)

17-18: Typo: "eviction was on policies" appears corrupted.

This phrase doesn't make grammatical sense. It likely should be "eviction based on policies" or "eviction policies".

🔤 Proposed fix
-The middle layer, the KVBM layer, encapsulates the core logic of the KV block manager and serves as the runtime substrate for managing block memory. The KVBM adapter layer normalizes the representations and data layout for the incoming requests across runtimes and forwards them to the core memory manager. The KVBM and the core modules implement required internal functionality, such as table lookups, memory allocation, block layout management, lifecycle, and state transitions and block reuse or eviction was on policies.
+The middle layer, the KVBM layer, encapsulates the core logic of the KV block manager and serves as the runtime substrate for managing block memory. The KVBM adapter layer normalizes the representations and data layout for the incoming requests across runtimes and forwards them to the core memory manager. The KVBM and the core modules implement required internal functionality, such as table lookups, memory allocation, block layout management, lifecycle, state transitions, and block reuse or eviction based on policies.
fern/fern/pages/backends/trtllm/README.mdx-163-173 (1)

163-173: Duplicate sections: Client and Benchmarking appear twice.

Lines 163-173 (Client and Benchmarking) are duplicated at lines 207-216. Consider removing one set.

🔧 Suggested fix

Remove the duplicate "Client" and "Benchmarking" sections (lines 207-216), keeping only the first occurrence (lines 163-173). Alternatively, if the second occurrence is intentional as a summary section, remove the first occurrence to avoid redundancy in the Table of Contents flow.

 ## Client

 See [client](../../backends/sglang/README#testing-the-deployment) section to learn how to send request to the deployment.

 NOTE: To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend <args>`.

 ## Benchmarking

 To benchmark your deployment with AIPerf, see this utility script, configuring the
 `model` name and `host` based on your deployment: [perf.sh](https://github.com/ai-dynamo/dynamo/tree/main/benchmarks/llm/perf.sh)

-## Client
-
-See [client](../../backends/sglang/README#testing-the-deployment) section to learn how to send request to the deployment.
-
-NOTE: To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend <args>`.
-
-## Benchmarking
-
-To benchmark your deployment with AIPerf, see this utility script, configuring the
-`model` name and `host` based on your deployment: [perf.sh](https://github.com/ai-dynamo/dynamo/tree/main/benchmarks/llm/perf.sh)
fern/fern/pages/planner/sla_planner.mdx-66-73 (1)

66-73: Grammar: subject-verb agreement.

Line 68: "three load prediction model is supported" should be "three load prediction models are supported".

🔤 Proposed fix
-The SLA planner use load predictor to predict the number of requests, ISL, and OSL in the next adjustment interval. Currently, three load prediction model is supported:
+The SLA planner uses a load predictor to predict the number of requests, ISL, and OSL in the next adjustment interval. Currently, three load prediction models are supported:
fern/fern/pages/planner/sla_planner.mdx-96-105 (1)

96-105: Minor grammar fixes needed.

  • Line 99: "maybe" should be "may be" (two words)
  • Line 102: "calculate" should be "calculates"
🔤 Proposed fixes
-- **ITL**: actual ITL maybe affected by chunked small prefill request in decode engine.
+- **ITL**: actual ITL may be affected by chunked small prefill requests in the decode engine.
-SLA planner calculate the correction factor with
+SLA planner calculates the correction factor with

@nealvaidya
Copy link
Contributor

@coderabbitai we'll address content issues in a follow-up PR. For now, review for issues related to

  • rendering
  • links and filenames
  • configuration of fern itself

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 15, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

@nealvaidya, I'll review the Fern configuration, rendering setup, links, and filenames. Let me check for potential issues.


🏁 Script executed:

#!/bin/bash
# Check Fern configuration files for syntax and structure
echo "=== Checking fern.config.json ==="
cat fern/fern/fern.config.json | jq '.' 2>&1 || echo "JSON parsing failed"

echo -e "\n=== Checking docs.yml structure ==="
cat fern/fern/docs.yml

echo -e "\n=== Checking .gitignore ==="
cat fern/fern/.gitignore

Length of output: 2352


🏁 Script executed:

#!/bin/bash
# Check all MDX files for frontmatter and basic syntax issues
echo "=== Checking MDX files for frontmatter ==="
fd -e mdx -e md . fern/fern/pages fern/fern/pages-v* --exec sh -c '
  file="$1"
  echo "Checking: $file"
  # Check if file starts with frontmatter (---)
  if ! head -n 1 "$file" | grep -q "^---$"; then
    echo "  ⚠️  Missing frontmatter opening"
  fi
  # Check for unclosed frontmatter
  frontmatter_count=$(grep -c "^---$" "$file" 2>/dev/null || echo 0)
  if [ "$frontmatter_count" -eq 1 ]; then
    echo "  ⚠️  Unclosed frontmatter (only one --- found)"
  fi
' _ {}

Length of output: 6570


🏁 Script executed:

#!/bin/bash
# Check for internal links and verify they exist
echo "=== Checking internal links in MDX files ==="
fd -e mdx -e md . fern/fern/pages --exec sh -c '
  file="$1"
  # Extract markdown links [text](path) - looking for relative paths
  grep -oP "\[([^\]]+)\]\(([^)]+)\)" "$file" 2>/dev/null | while read -r link; do
    # Extract the path from the link
    path=$(echo "$link" | grep -oP "\]\(\K[^)]+")
    
    # Skip external links (http/https), anchors (#), and mailto
    if echo "$path" | grep -qE "^(https?://|#|mailto:)"; then
      continue
    fi
    
    # Get directory of current file
    dir=$(dirname "$file")
    
    # Resolve relative path
    if [[ "$path" == /* ]]; then
      # Absolute path from root
      target="fern/fern/pages${path}"
    else
      # Relative path
      target="$dir/$path"
    fi
    
    # Remove .mdx extension if not present and check both .mdx and .md
    target_base="${target%.mdx}"
    target_base="${target_base%.md}"
    
    if [ ! -f "$target" ] && [ ! -f "${target_base}.mdx" ] && [ ! -f "${target_base}.md" ] && [ ! -d "$target_base" ]; then
      echo "⚠️  $file -> $path (target not found: $target)"
    fi
  done
' _ {}

Length of output: 41314


🏁 Script executed:

#!/bin/bash
# Check for common MDX rendering issues
echo "=== Checking for potential MDX rendering issues ==="

echo -e "\n--- Files with unbalanced code fences ---"
fd -e mdx -e md . fern/fern/pages fern/fern/pages-v* --exec sh -c '
  file="$1"
  fence_count=$(grep -c "^\`\`\`" "$file" 2>/dev/null || echo 0)
  if [ $((fence_count % 2)) -ne 0 ]; then
    echo "⚠️  $file has unbalanced code fences (count: $fence_count)"
  fi
' _ {}

echo -e "\n--- Files with potential unclosed JSX/HTML tags ---"
fd -e mdx -e md . fern/fern/pages --exec sh -c '
  file="$1"
  # Look for common unclosed tags (basic check)
  if grep -q "<Callout" "$file" 2>/dev/null; then
    callout_open=$(grep -o "<Callout" "$file" | wc -l)
    callout_close=$(grep -o "</Callout>" "$file" | wc -l)
    if [ "$callout_open" -ne "$callout_close" ]; then
      echo "⚠️  $file: Callout tags mismatch (open: $callout_open, close: $callout_close)"
    fi
  fi
' _ {}

Length of output: 1278


🏁 Script executed:

#!/bin/bash
# Check version path consistency in docs.yml
echo "=== Checking version paths in docs.yml match actual directories ==="

# Extract version paths from docs.yml
grep -A 1 "display-name:" fern/fern/docs.yml | grep "path:" | sed 's/.*path: //' | while read -r version_path; do
  full_path="fern/fern/${version_path}"
  if [ ! -d "$full_path" ]; then
    echo "⚠️  Version path in docs.yml not found: $full_path"
  else
    echo "✓ $full_path exists"
  fi
done

Length of output: 1219


🏁 Script executed:

#!/bin/bash
# Check for file naming consistency
echo "=== Checking file naming patterns ==="

echo -e "\n--- Files with spaces in names ---"
fd . fern/fern --type f --exec sh -c '
  if echo "$1" | grep -q " "; then
    echo "⚠️  File has spaces: $1"
  fi
' _ {}

echo -e "\n--- Files with unusual characters ---"
fd . fern/fern --type f --exec sh -c '
  # Check for characters that might cause issues
  if echo "$1" | grep -qE "[^a-zA-Z0-9/_.-]"; then
    echo "⚠️  File has unusual characters: $1"
  fi
' _ {}

echo -e "\n--- Checking for README vs index naming consistency ---"
fd -e mdx "README" fern/fern/pages
fd -e mdx "index" fern/fern/pages

Length of output: 490


@nealvaidya, I've completed the review focusing on rendering, links/filenames, and Fern configuration. Here are the issues found:

🚨 Critical Issues

1. Missing Version Configuration Files

The docs.yml references version files that don't exist:

versions:
  - display-name: Next
    path: ./versions/latest.yml
  - display-name: v0.7.1
    path: ./versions/v0.7.1.yml
  # ... (all 16 version files are missing)

Impact: Fern build will fail without these version configuration files.

2. Unbalanced Code Fences (Will break rendering)

  • fern/fern/pages/performance/aiconfigurator.mdx - 11 code fences (odd number)
  • fern/fern/pages/kubernetes/dynamo_operator.mdx - 9 code fences (odd number)
  • fern/fern/pages/design-docs/distributed_runtime.mdx - 1 code fence (odd number)

3. Missing Frontmatter

  • fern/fern/pages/kubernetes/api_reference.mdx - No frontmatter --- delimiters

⚠️ High-Priority Issues

4. Broken Internal Links (200+ occurrences)

Common patterns:

  • Absolute paths that don't exist: /components/planner/sla-planner-quick-start, /kubernetes-deployment/deployment-guide/detailed-installation-guide, etc.
  • Directory name mismatches: Links use underscores but files use hyphens
    • ../design_docs/distributed_runtime → should be ../design-docs/distributed_runtime
    • ../fault_tolerance/request_migration → should be ../fault-tolerance/request_migration
  • Relative path issues: Many ./ prefix links in getting-started that don't resolve correctly

Example broken links:

# In planner files:
[link](/components/planner/sla-planner-quick-start)# In observability files:
[link](../design_docs/distributed_runtime) ❌ (should be design-docs)

5. Inconsistent Directory Index Naming

  • Most directories use README.mdx (7 occurrences)
  • One directory uses index.mdx (multimodal)
  • Recommendation: Fern typically expects index.mdx - consider standardizing

✅ What's Working Well

  • JSON and YAML syntax is valid
  • No files with spaces or unusual characters in names
  • .gitignore configuration looks good
  • Logo paths and favicon configuration are correct
  • Color scheme and branding setup is proper

📋 Recommended Actions

  1. Create missing version YAML files in fern/fern/versions/ directory
  2. Fix unbalanced code fences in the 3 identified files
  3. Add frontmatter to api_reference.mdx
  4. Fix broken internal links - this is extensive and may require a systematic approach:
    • Update absolute paths to match new structure
    • Fix underscore/hyphen mismatches in directory names
    • Verify relative paths resolve correctly
  5. Consider standardizing to index.mdx instead of README.mdx for directory indexes

Would you like me to help identify which version YAML files need to be created, or provide a more detailed breakdown of specific link fixes needed?

@Jont828 Jont828 force-pushed the fern-migration-parallel branch from d4d16ef to 5702911 Compare January 15, 2026 20:54
@grahamking
Copy link
Contributor

Wow, big PR!

  1. Could you link to the fern tool's repo and docs, for those of us not familiar with it.
  2. The docs would always have to be in the docs/ folder, "fern" doesn't exactly scream "the docs are in this folder".
  3. Most people consume the docs by reading the Markdown in this repo. Would that change with the "fern" tool replacing Sphinx?

@Jont828 Jont828 force-pushed the fern-migration-parallel branch 2 times, most recently from 257be87 to 7d21ec5 Compare January 15, 2026 22:53
@Jont828
Copy link
Contributor Author

Jont828 commented Jan 15, 2026

@grahamking Hey! Here's a link to Fern. Initially, I implemented this with Docusaurus, but I spoke with @nealvaidya and he's the one who suggested I use Fern because there's existing work with NVIDIA to pay for Fern's doc hosting services. Yes, the plan is to replace Sphinx and move the Fern docs into the docs/ folder, right now I just made the fern/ folder so we can start building the site without taking down the existing Sphinx set up.

@nealvaidya
Copy link
Contributor

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 15, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🤖 Fix all issues with AI agents
In `@fern/fern/pages/api/nixl_connect/device.mdx`:
- Around line 30-32: Replace all internal links in
fern/pages/api/nixl_connect/device.mdx that use kebab-case with the actual
underscore filenames: change
/additional-resources/api-reference/nixl-connect/device-kind →
/additional-resources/api-reference/nixl-connect/device_kind, /operation-status
→ /operation_status, /read-operation → /read_operation, /readable-operation →
/readable_operation, /writable-operation → /writable_operation, and
/rdma-metadata → /rdma_metadata; update the anchor links shown in the diff (the
two links in the device description and the links later in the file) so they
reference the underscore versions to match the actual .mdx file names.

In `@fern/fern/pages/backends/sglang/README.mdx`:
- Around line 39-44: Update the six broken links in the feature support matrix
by replacing the incorrect paths with the corrected ones: change
"/design-docs/disaggregated-serving" to "/design-docs/disagg-serving" for both
occurrences on lines containing "Disaggregated Serving" and "Conditional
Disaggregation"; change "/additional-resources/router-details/kv-cache-routing"
(the "KV-Aware Routing" link) to "/router/kv_cache_routing"; change
"/components/planner/sla-based-planner" (the "SLA-Based Planner" link) to
"/planner/sla_planner"; change
"/additional-resources/multimodal-details/sg-lang" (the "Multimodal Support"
link) to "/multimodal/sglang"; and change "/components/kvbm/architecture" (the
"KVBM" link) to "/kvbm/kvbm_architecture".

In `@fern/fern/pages/backends/trtllm/README.mdx`:
- Around line 44-49: Update the broken internal links in
fern/pages/backends/trtllm/README.mdx to use the correct path prefixes and file
names: replace "/design-docs/disaggregated-serving" with
"/design-docs/disagg-serving",
"/additional-resources/router-details/kv-cache-routing" with
"/router/kv-cache-routing", "/components/planner/sla-based-planner" with
"/planner/sla-based-planner", "/additional-resources/load-planner" with
"/planner/load-planner" (matching load_planner.mdx), and
"/components/kvbm/architecture" with "/kvbm/architecture" (matching
kvbm_architecture.mdx); apply the same corrections for the identical link
patterns found in backends/sglang/README.mdx, backends/vllm/README.mdx, and
design-docs/architecture.mdx so navigation entries match next.yml mapping.

In `@fern/fern/pages/benchmarks/sla_driven_profiling.mdx`:
- Line 11: Update the broken internal links in the MDX by replacing the old
paths with the corrected ones: change the link with text "SLA-Driven Profiling
and Planner Deployment Quick Start Guide" (currently pointing to
/components/planner/sla-planner-quick-start) to /planner/sla-planner-quickstart;
update any links pointing to /components/planner/sla-based-planner to
/planner/sla-planner; change links pointing to
/user-guides/tuning-disaggregated-performance to /performance/tuning; change
/additional-resources/advanced-kubernetes/api-reference to
/kubernetes/api-reference; and for the link currently pointing to
/kubernetes-deployment/observability-k-8-s/metrics, verify the correct target
with the docs team and either fix to the proper metrics path or remove/flag the
link if no matching doc exists (search for link text "metrics" or
"observability" in the file to locate it).

In `@fern/fern/pages/development/backend-guide.mdx`:
- Around line 143-165: The two internal links pointing to
/additional-resources/... are broken; update the link targets for "Request
Migration Architecture" and "Request Cancellation Architecture" so they point to
the correct internal paths (/fault-tolerance/request-migration and
/fault-tolerance/request-cancellation) by replacing the strings
"/additional-resources/fault-tolerance/request-migration" and
"/additional-resources/fault-tolerance/request-cancellation" in the Request
Migration and Request Cancellation sections of backend-guide.mdx.

In `@fern/fern/pages/getting-started/quickstart.mdx`:
- Around line 77-100: Update the broken internal links in the Documentation
Overview list: replace backend paths `/components/backends/v-llm`,
`/components/backends/sg-lang`, and `/components/backends/tensor-rt-llm` with
`/components/backends/vllm`, `/components/backends/sglang`, and
`/components/backends/tensorrt-llm` respectively; update user-guide paths
`/user-guides/tuning-disaggregated-performance` →
`/user-guides/disaggregation-and-performance-tuning` and
`/user-guides/finding-best-initial-configs` →
`/user-guides/finding-best-initial-configs-using-aiconfigurator`; and remove or
replace the non-existent `/additional-resources/cli-reference` entry (in the
same list block that contains "Performance & Tuning" and "Getting Help") with an
existing valid page or omit it.

In `@fern/fern/pages/kubernetes/deployment/create_deployment.mdx`:
- Line 154: Update the incomplete sentence and broken link in the line
containing "If you are a Dynamo contributor the [dynamo run
guide](/additional-resources/cli-reference)"; change the link target to
/reference/cli and insert the missing verb and punctuation so it reads like "If
you are a Dynamo contributor, see the [dynamo run guide](/reference/cli) for
details on how to run this command."

In `@fern/fern/pages/kvbm/kvbm_design_deepdive.mdx`:
- Around line 1-3: The frontmatter 'title' in kvbm_design_deepdive.mdx is
incorrect (it currently reads "KVBM components"); update the YAML frontmatter
title field to a correct, descriptive title that matches this file (e.g., "KVBM
design deep dive") so the page title, navigation, and browser tab reflect the
file's purpose; locate and edit the top-of-file frontmatter 'title' key to the
new value.

In `@fern/fern/pages/kvbm/kvbm_integrations.mdx`:
- Around line 22-23: In kvbm_integrations.mdx update the two internal links that
point to /components/kvbm/kvbm-in-v-llm and /components/kvbm/kvbm-in-trtllm so
they match the actual target filenames used in this PR (vllm-setup.mdx and
trtllm-setup.mdx); either change the hrefs to the correct paths that resolve to
vllm-setup and trtllm-setup (e.g., /components/kvbm/vllm-setup and
/components/kvbm/trtllm-setup) or update the docs.yml navigation to create
aliases for the existing paths—ensure the link targets in kvbm_integrations.mdx
exactly match the resolved route names used by Fern.

In `@fern/fern/pages/kvbm/vllm-setup.mdx`:
- Line 12: The internal link "/components/kvbm/architecture" in vllm-setup.mdx
is using the same broken pattern as trtllm-setup.mdx; open vllm-setup.mdx and
replace that href with the correct Fern-docs path that matches the site's
navigation (use the same corrected path you applied in trtllm-setup.mdx),
ensuring the link target (the "/components/kvbm/architecture" string) matches an
existing page slug in the repository and updates any relative/absolute pathing
accordingly.

In `@fern/fern/pages/multimodal/trtllm.mdx`:
- Line 36: Update the broken internal link in the TRT-LLM page by replacing the
currently referenced path
"/user-guides/multimodality-support#architecture-patterns" with the correct
internal path "/multimodal#architecture-patterns"; locate the sentence
containing "TRT-LLM supports aggregated and traditional disaggregated patterns"
(the link on "Architecture Patterns") in trtllm.mdx and make the same
replacement in the other files that contain the identical broken link (vllm.mdx
and sglang.mdx).

In `@fern/fern/pages/observability/metrics.mdx`:
- Around line 111-116: Update the metric names to use the project's `_total`
suffix convention: rename `dynamo_component_inflight_requests` and
`dynamo_frontend_inflight_requests` to
`dynamo_component_inflight_requests_total` and
`dynamo_frontend_inflight_requests_total`, and audit the other listed metrics
(`dynamo_component_request_bytes_total`,
`dynamo_component_request_duration_seconds`, `dynamo_component_requests_total`,
`dynamo_component_response_bytes_total`, `dynamo_component_uptime_seconds`) to
ensure any gauge-style metrics follow the `_total` naming (also apply same
changes mentioned for lines 153-156). Locate and update occurrences in the
observability docs and any corresponding metric export code so names match
exactly.
♻️ Duplicate comments (2)
fern/fern/pages/frontends/kserve.mdx (1)

97-102: Use version-stable GitHub links (avoid tree/main).

These links will drift as main changes and break versioned docs. Please pin to tags/SHAs or use version-relative references for all main links in this section.

Also applies to: 106-106

fern/fern/pages/kubernetes/autoscaling.mdx (1)

47-47: Inconsistent documentation about DGDSA default behavior.

The documentation contains contradictory statements:

  • Line 47: "the operator automatically creates one adapter per service"
  • Line 102: "When DGDSA is enabled (the default)"
  • Line 127: "By default, no DGDSA is created for services"
  • Line 594-596: "With DGDSA Enabled (Default)"

Please clarify the actual default behavior and ensure consistency throughout the document.

Also applies to: 127-128

🟡 Minor comments (35)
fern/fern/pages/frontends/kserve.mdx-12-12 (1)

12-12: Hyphenate compound modifiers for readability.

Examples: “industry-standard”, “tensor-based”, “client-side”.

✏️ Suggested edits
-[KServe v2 API](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2) is one of the industry standard protocol for machine learning model inference.
+[KServe v2 API](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2) is one of the industry-standard protocol for machine learning model inference.

-* `ModelType::TensorBased` and `ModelInput::Tensor`: Combination for backend that is used for generic tensor based inference
+* `ModelType::TensorBased` and `ModelInput::Tensor`: Combination for backend that is used for generic tensor-based inference

-... specific conversion between generic tensor based messages ...
+... specific conversion between generic tensor-based messages ...

-This combination is used when the user is migrating an existing KServe based backend ...
+This combination is used when the user is migrating an existing KServe-based backend ...

-... metadata as tensor based deployment is generic ...
+... metadata as tensor-based deployment is generic ...

-... returned for client side logic ...
+... returned for client-side logic ...

Also applies to: 35-35, 41-41, 92-98

fern/fern/pages/guides/request_plane.mdx-163-167 (1)

163-167: Fix hyphenation for compound adjective.

"KV based routing" should be "KV-based routing" (hyphenated compound adjective). This appears twice in the NATS usage section. Based on static analysis hints.

📝 Suggested fix
 **When to use NATS:**
 - Production deployments with service discovery
-- Currently KV based routing require NATS. If you want to completely disable NATS, KV based routing won't be available
+- Currently KV-based routing requires NATS. If you want to completely disable NATS, KV-based routing won't be available
 - Need for message replay and persistence features

Note: Also corrected "require" → "requires" for subject-verb agreement.

fern/fern/pages/development/backend-guide.mdx-104-104 (1)

104-104: Typo in example: "generat" should be "generate".

📝 Suggested fix
-Node 2: namespace: llama3-1-8b, component: backend, endpoint: generat, model: /data/Llama-3.1-8B-Instruct/
+Node 2: namespace: llama3-1-8b, component: backend, endpoint: generate, model: /data/Llama-3.1-8B-Instruct/
fern/fern/pages/benchmarks/kv-router-ab-testing.mdx-104-104 (1)

104-104: Update link to internal Fern documentation.

The link on line 104 should point to the internal Fern documentation page instead of the external GitHub URL. An installation guide exists at /kubernetes/installation_guide in the Fern docs. Update the link from https://github.com/ai-dynamo/dynamo/blob/main/docs/kubernetes/installation_guide.md to the relative path.

fern/fern/pages/kvbm/kvbm_architecture.mdx-17-17 (1)

17-17: Typo: "eviction was on policies" appears garbled.

The phrase "eviction was on policies" doesn't make grammatical sense. This likely should be "eviction based on policies" or similar.

Suggested fix
-The middle layer, the KVBM layer, encapsulates the core logic of the KV block manager and serves as the runtime substrate for managing block memory. The KVBM adapter layer normalizes the representations and data layout for the incoming requests across runtimes and forwards them to the core memory manager. The KVBM and the core modules implement required internal functionality, such as table lookups, memory allocation, block layout management, lifecycle, and state transitions and block reuse or eviction was on policies. The KVBM layer also has required abstractions for external components to override or augment its behavior.
+The middle layer, the KVBM layer, encapsulates the core logic of the KV block manager and serves as the runtime substrate for managing block memory. The KVBM adapter layer normalizes the representations and data layout for the incoming requests across runtimes and forwards them to the core memory manager. The KVBM and the core modules implement required internal functionality, such as table lookups, memory allocation, block layout management, lifecycle, and state transitions and block reuse or eviction based on policies. The KVBM layer also has required abstractions for external components to override or augment its behavior.
fern/fern/pages/planner/load_planner.mdx-30-32 (1)

30-32: Fix duplicate list numbering.
Line 31 and Line 32 both use 1.; should be 1. and 2..

✏️ Suggested edit
-1. After a new decode worker is added, since it needs time to populate the kv cache, planner doesn't scale down the number of decode workers in the next `NEW_DECODE_WORKER_GRACE_PERIOD=3` adjustment intervals.
-1. We do not scale up prefill worker if the prefill queue size is estimated to reduce below the `--prefill-queue-scale-up-threshold` within the next `NEW_PREFILL_WORKER_QUEUE_BUFFER_PERIOD=3` adjustment intervals following the trend observed in the current adjustment interval.
+1. After a new decode worker is added, since it needs time to populate the kv cache, planner doesn't scale down the number of decode workers in the next `NEW_DECODE_WORKER_GRACE_PERIOD=3` adjustment intervals.
+2. We do not scale up prefill worker if the prefill queue size is estimated to reduce below the `--prefill-queue-scale-up-threshold` within the next `NEW_PREFILL_WORKER_QUEUE_BUFFER_PERIOD=3` adjustment intervals following the trend observed in the current adjustment interval.
fern/fern/pages/performance/tuning.mdx-10-40 (1)

10-40: Tighten a few grammar/wording typos in the intro/Callout.
Small edits improve readability without changing meaning (Line 12, Line 26, Line 39).

✏️ Suggested edits
-Specifically, there are three sets of parameters that needs to be tuned:
+Specifically, there are three sets of parameters that need to be tuned:

-The next thing to decide is how many numbers of GPU to serve the model.
+The next thing to decide is how many GPUs to serve the model.

-For decode-only engines, sometimes larger number of GPUs has to larger KV cache per GPU and more decoding requests running in parallel, which leads to both better throughput/GPU and better latency/user.
+For decode-only engines, a larger number of GPUs can yield larger KV cache per GPU and more decoding requests running in parallel, which leads to both better throughput/GPU and better latency/user.
fern/fern/pages/design-docs/architecture.mdx-42-49 (1)

42-49: Fix internal link slug for disaggregated serving design doc.
The link at line 44 uses /design-docs/disaggregated-serving, but the file is named disagg-serving.mdx with no explicit slug override. Update the link to /design-docs/disagg-serving to avoid a 404.

fern/fern/pages/design-docs/disagg-serving.mdx-78-81 (1)

78-81: Fix typo: "comptued" → "computed".

Line 80 contains a spelling error in the diagram message.

📝 Suggested fix
     P-->>D: Remote NIXL read for prefix hit KV blocks (non-block)
     P->>P: Execute prefill
-    P-->>D: Remote NIXL write for comptued KV blocks (non-block)
+    P-->>D: Remote NIXL write for computed KV blocks (non-block)
fern/fern/pages/design-docs/disagg-serving.mdx-89-89 (1)

89-89: Fix subject-verb agreement: "leverage" → "leverages".

📝 Suggested fix
-The key to high-performance disaggregation is efficient KV transfer. Dynamo leverage NIXL to transfer KV cache directly from the VRAM of prefill engine to the VRAM of decode engine. In addition, the KV transfer is non-blocking, allowing GPU forward pass to serve other requests in addition to the KV transfer.
+The key to high-performance disaggregation is efficient KV transfer. Dynamo leverages NIXL to transfer KV cache directly from the VRAM of prefill engine to the VRAM of decode engine. In addition, the KV transfer is non-blocking, allowing GPU forward pass to serve other requests in addition to the KV transfer.
fern/fern/pages/design-docs/distributed_runtime.mdx-46-46 (1)

46-46: Fix grammar: "isn't be registered" → "isn't registered".

There's a grammatical error in this sentence.

📝 Suggested fix
-- `Component`: When a `Component` object is created, similar to `Namespace`, it isn't be registered in etcd. When `create_service` is called, it creates a NATS service group using `{namespace_name}.{service_name}` as the service identifier and registers a service in the registry of the `Component`, where the registry is an internal data structure that tracks all services and endpoints within the `DistributedRuntime`.
+- `Component`: When a `Component` object is created, similar to `Namespace`, it isn't registered in etcd. When `create_service` is called, it creates a NATS service group using `{namespace_name}.{service_name}` as the service identifier and registers a service in the registry of the `Component`, where the registry is an internal data structure that tracks all services and endpoints within the `DistributedRuntime`.
fern/fern/pages/kubernetes/deployment/minikube.mdx-10-10 (1)

10-10: Clarify setup wording and make GPU flag optional (Line 10, Line 26–Line 28).
Avoids confusion and prevents CPU-only users from hitting a failure.

✏️ Suggested edits
-This guide walks through the set up of everything you need to run Dynamo Kubernetes Platform locally.
+This guide walks through the setup you need to run Dynamo Kubernetes Platform locally.

-# Start Minikube with GPU support (if configured)
-minikube start --driver docker --container-runtime docker --gpus all --memory=16000mb --cpus=8
+# Start Minikube (omit --gpus all if you aren't using GPU support)
+minikube start --driver docker --container-runtime docker --gpus all --memory=16000mb --cpus=8

Also applies to: 26-28

fern/fern/pages/backends/vllm/README.mdx-10-10 (1)

10-10: Minor wording fixes for readability (Line 10, Line 59, Line 173).
These are small grammar/consistency tweaks.

✏️ Proposed doc wording edits
-... NIXL based transfer mechanisms ...
+... NIXL-based transfer mechanisms ...

-Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
+Below we provide a guide that lets you run all of the common deployment patterns on a single node.

-... Python's builtin hashing ...
+... Python's built-in hashing ...

Also applies to: 59-59, 173-173

fern/fern/pages/backends/vllm/deepseek-r1.mdx-10-14 (1)

10-14: Fix a few typos/grammar issues (Line 10–Line 14).
Improves professionalism and readability.

✏️ Suggested edits
-Dynamo supports running Deepseek R1 with data parallel attention and wide expert parallelism. Each data parallel attention rank is a seperate dynamo component that will emit its own KV Events and Metrics.
+Dynamo supports running Deepseek R1 with data parallel attention and wide expert parallelism. Each data parallel attention rank is a separate Dynamo component that will emit its own KV Events and Metrics.

-The following script can be adapted to run Deepseek R1 with a variety of different configuration.
+The following script can be adapted to run Deepseek R1 with a variety of configurations.
fern/fern/pages/backends/vllm/gpt-oss.mdx-115-115 (1)

115-115: Fix typo: "ususally" → "usually".

✏️ Suggested fix
-is that the application has a set of tools to aid the assistant provide accurate answer, and it is ususally
+is that the application has a set of tools to aid the assistant provide accurate answer, and it is usually
fern/fern/pages/backends/trtllm/multinode/multinode-examples.mdx-213-213 (1)

213-213: Typo: "succesfully" should be "successfully".

📝 Suggested fix
-   until the model succesfully finishes loading:
+   until the model successfully finishes loading:
fern/fern/pages/kubernetes/deployment/create_deployment.mdx-230-230 (1)

230-230: Step numbering jumps from Step 3 to Step 6.

The document has Steps 1, 2, 3, then jumps directly to Step 6 for LoRA deployment. This suggests either missing intermediate steps or a renumbering oversight.

📝 Suggested fix
-## Step 6: Deploy LoRA Adapters (Optional)
+## Step 4: Deploy LoRA Adapters (Optional)

Alternatively, if Steps 4-5 exist elsewhere and were removed, ensure the numbering is sequential.

fern/fern/pages/backends/trtllm/multinode/multinode-examples.mdx-81-82 (1)

81-82: Typo: "iamge" should be "image".

📝 Suggested fix
 # NOTE: IMAGE must be set manually for now
-# To build an iamge, see the steps here:
+# To build an image, see the steps here:
fern/fern/pages/backends/trtllm/multinode/multinode-examples.mdx-43-44 (1)

43-44: Path references old Sphinx docs location.

The link docs/backends/trtllm/README.md references the current Sphinx documentation path. After the Fern migration completes and fern/ replaces docs/, this link will break. Consider using a relative Fern path or noting this needs updating post-migration.

Similarly affected: lines 82-83 reference the same docs/ path pattern.

fern/fern/pages/multimodal/sglang.mdx-336-340 (1)

336-340: Clarify NIXL usage for E/P/D mode in the table.

The table states E/P/D transfers embeddings to "Prefill" but line 142 and the workflow diagram show embeddings go to the Decode Worker first (which is the entry point), then Decode coordinates with Prefill. This creates a potential inconsistency.

📝 Suggested fix for accuracy
 | Use Case | NIXL Used? | Data Transfer | Notes |
 |----------|------------|---------------|-------|
 | E/PD (Encode Separate) | Yes | Encoder → PD (embeddings) | Vision encoder separate |
-| E/P/D (Full Disaggregation) | Yes | Encoder → Prefill (embeddings) | KV cache via SGLang bootstrap |
+| E/P/D (Full Disaggregation) | Yes | Encoder → Decode (embeddings) | KV cache via SGLang bootstrap |
fern/fern/pages/backends/trtllm/gpt-oss.mdx-216-216 (1)

216-216: Typo: "ususally" should be "usually".

Suggested fix
-is that the application has a set of tools to aid the assistant provide accurate answer, and it is ususally
+is that the application has a set of tools to aid the assistant provide accurate answer, and it is usually
fern/fern/pages/kubernetes/api_reference.mdx-20-25 (1)

20-25: Duplicate paragraph content.

Lines 20-21 and 25 contain the same sentence about "Package v1alpha1 contains API Schema definitions". This appears to be unintentional duplication.

Suggested fix
 Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
 
 This package defines the DynamoGraphDeploymentRequest (DGDR) custom resource, which provides
 a high-level, SLA-driven interface for deploying machine learning models on Dynamo.
 
-Package v1alpha1 contains API Schema definitions for the nvidia.com v1alpha1 API group.
-
 ### Resource Types
fern/fern/pages/backends/trtllm/gpt-oss.mdx-176-176 (1)

176-176: Section numbering inconsistency - step 5 is missing.

The instructions jump from "### 4. Launch the Deployment" (line 124) to "### 6. Verify the Deployment is Ready" (line 176). Either add step 5 or renumber to maintain sequential ordering.

Suggested fix
-### 6. Verify the Deployment is Ready
+### 5. Verify the Deployment is Ready

And update subsequent sections (7 → 6, 8 → 7).

fern/fern/pages/observability/README.mdx-37-37 (1)

37-37: Minor grammatical fix: "Documentations" → "Documentation".

"Documentation" is typically used as an uncountable noun in English.

📝 Suggested fix
-## Observability Documentations
+## Observability Documentation
fern/fern/pages/backends/sglang/gpt-oss.mdx-10-11 (1)

10-11: Fix typo: "ues" → "use".

📝 Suggested fix
 The gpt-oss-120b guide for SGLang is largely identical to the [guide for vLLM](/additional-resources/backend-details/v-llm/gpt-oss),
-please ues the vLLM guide as a reference with the different deployment steps as highlighted below:
+please use the vLLM guide as a reference with the different deployment steps as highlighted below:
fern/fern/pages/multimodal/index.mdx-18-22 (1)

18-22: Fill or remove the empty “Backend Documentation” section.
Right now it’s an orphaned header; either add links to the backend-specific pages or drop the section to avoid a dead spot in the page.

fern/fern/pages/api/nixl_connect/descriptor.mdx-10-11 (1)

10-11: Fix small typos/grammar in the Descriptor overview and registration note.
These are user-facing docs, so it’s worth polishing the wording.

✏️ Proposed edits
-Memory descriptor that ensures memory is registered with the NIXL-base I/O subsystem.
+Memory descriptor that ensures memory is registered with the NIXL-based I/O subsystem.

-3. From a Python `bytes` object. Memory is assumed to reside in CPU addressable host memory.
+3. From a Python `bytes` object. Memory is assumed to reside in CPU-addressable host memory.

-When the descriptor is assigned to a NIXL operation, it will be automatically registered if was not explicitly registered.
+When the descriptor is assigned to a NIXL operation, it will be automatically registered if it was not explicitly registered.

Also applies to: 21-21, 39-39

fern/fern/pages/backends/trtllm/llama4_plus_eagle.mdx-20-28 (1)

20-28: Clean up the setup note wording to avoid confusion.

✏️ Proposed edits
-* Make sure the (`eagle3_one_model: true`) is set in the LLM API config inside the `examples/backends/trtllm/engine_configs/llama4/eagle` folder.
+* Make sure `eagle3_one_model: true` is set in the LLM API config inside the `examples/backends/trtllm/engine_configs/llama4/eagle` folder.

-Assuming you have already allocated your nodes via `salloc`, and are
-inside an interactive shell on one of the allocated nodes, set the
-following environment variables based:
+Assuming you have already allocated your nodes via `salloc`, and are
+inside an interactive shell on one of the allocated nodes, set the
+following environment variables based on your environment:
fern/fern/pages/api/nixl_connect/README.mdx-10-14 (1)

10-14: Hyphenate compound adjective.

“container hosted” should be “container-hosted” for correct grammar.

💡 Suggested fix
-The `dynamo.nixl_connect` library can be imported by any Dynamo container hosted application.
+The `dynamo.nixl_connect` library can be imported by any Dynamo container-hosted application.
fern/fern/pages/api/nixl_connect/README.mdx-107-110 (1)

107-110: Fix stray “KV$” typo.

Looks like a formatting artifact; should read “KV cache”.

💡 Suggested fix
-6. Prefill Worker receives the embeddings from Encode Worker and generates a key-value cache (KV$) update for Decode Worker's LLM and writes the update directly to the GPU memory reserved for the data.
+6. Prefill Worker receives the embeddings from Encode Worker and generates a key-value cache (KV) update for Decode Worker's LLM and writes the update directly to the GPU memory reserved for the data.
fern/fern/pages/observability/health-checks.mdx-58-62 (1)

58-62: Fix port mismatch in example request.

The earlier section says frontend defaults to 8000; this example uses 8080.

💡 Suggested fix
-curl -s localhost:8080/live -q | jq
+curl -s localhost:8000/live -q | jq
fern/fern/pages/observability/metrics.mdx-20-23 (1)

20-23: Correct lines 107 and 109: they incorrectly state "port 8081 by default"

The table correctly specifies default -1 (disabled), but lines 107 and 109 contradict this by claiming metrics are exposed "on port 8081 by default." The actual default is -1 (disabled); 8081 is only the example port shown in documentation. Align these lines with the table to clarify that users must explicitly set DYN_SYSTEM_PORT to enable metrics.

fern/fern/pages/backends/trtllm/README.mdx-163-172 (1)

163-172: Duplicate sections: "Client" and "Benchmarking" appear twice.

Lines 163-172 contain "Client" and "Benchmarking" sections, but these are duplicated at lines 207-216 with identical content. This redundancy should be removed.

Proposed fix: Remove duplicate sections (lines 207-216)
-## Client
-
-See [client](/components/backends/sg-lang#testing-the-deployment) section to learn how to send request to the deployment.
-
-NOTE: To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend <args>`.
-
-## Benchmarking
-
-To benchmark your deployment with AIPerf, see this utility script, configuring the
-`model` name and `host` based on your deployment: [perf.sh](https://github.com/ai-dynamo/dynamo/tree/main/benchmarks/llm/perf.sh)
-
 ## Multimodal support

Also applies to: 207-216

fern/fern/pages/backends/trtllm/README.mdx-61-61 (1)

61-61: Fix grammatical error.

"all of our the common deployment patterns" should be "all of the common deployment patterns" or "all our common deployment patterns".

-Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
+Below we provide a guide that lets you run all of the common deployment patterns on a single node.
fern/fern/pages/multimodal/trtllm.mdx-259-259 (1)

259-259: Verify container build reference path.

Line 259 references docs/backends/trtllm/README.md#build-container which appears to be a path from the old Sphinx docs structure, not the new Fern structure.

-# Container image (build using docs/backends/trtllm/README.md#build-container)
+# Container image (build using /backends/tensor-rt-llm#build-container)
🧹 Nitpick comments (18)
fern/fern/pages/guides/request_plane.mdx (1)

171-195: Consolidate duplicate example sections.

The "Complete Example" (lines 171-176) and "Real-World Example" (lines 178-195) sections both reference the same script file examples/backends/vllm/launch/agg_request_planes.sh. Consider consolidating these into a single section to avoid redundancy.

📝 Suggested consolidation
 ## Complete Example

-Here's a complete example showing how to launch a Dynamo deployment with different request planes:
-
-See [`examples/backends/vllm/launch/agg_request_planes.sh`](https://github.com/ai-dynamo/dynamo/tree/main/examples/backends/vllm/launch/agg_request_planes.sh) for a complete working example that demonstrates launching Dynamo with TCP, HTTP, or NATS request planes.
-
-
-## Real-World Example
-
-The Dynamo repository includes a complete example demonstrating all three request planes:
+The Dynamo repository includes a complete working example demonstrating all three request planes:

 **Location:** `examples/backends/vllm/launch/agg_request_planes.sh`

+See the [source on GitHub](https://github.com/ai-dynamo/dynamo/tree/main/examples/backends/vllm/launch/agg_request_planes.sh).
+
 ```bash
 cd examples/backends/vllm/launch
fern/fern/pages/development/backend-guide.mdx (2)

19-58: Code example has missing imports and no language identifier.

The code block is missing a language identifier for proper syntax highlighting (should be python), and the example is missing imports for uvloop and asyncio which are used at lines 56-57.

📝 Suggested improvements
-```
+```python
 from dynamo.llm import ModelInput, ModelType, register_llm
 from dynamo.runtime import DistributedRuntime, dynamo_worker
+import asyncio
+import uvloop

87-87: Minor grammar: use hyphenated "load-balanced" as compound adjective.

📝 Suggested fix
-* *Component*: A load balanced service needed to run that pipeline.
+* *Component*: A load-balanced service needed to run that pipeline.
fern/fern/pages/getting-started/intro.mdx (1)

1-72: Clarify the purpose of this file vs. quickstart.mdx - content appears duplicated.

This file (intro.mdx) has the same title "Welcome to NVIDIA Dynamo" and nearly identical content as quickstart.mdx (lines 1-72 are essentially the same). Having two files with the same title in the same directory will cause confusion in navigation and SEO.

Consider one of the following:

  1. Remove one file if they serve the same purpose
  2. Differentiate the content - e.g., make intro.mdx a high-level overview without quickstart commands, and keep quickstart.mdx focused on hands-on setup
  3. Rename with distinct titles if both are needed
fern/fern/pages/benchmarks/sla_driven_profiling.mdx (1)

106-106: Optional: Minor grammar and style improvements.

Consider these minor refinements for more formal technical writing:

  1. Line 106: "fix this issue" → consider "resolve this issue" for more formal wording
  2. Line 108: Hyphenate compound adjectives: "per GPU throughput" → "per-GPU throughput", "y coordinate" → "y-coordinate"
  3. Line 112: Hyphenate compound adjective: "computation bound MLP kernel" → "computation-bound MLP kernel"
📝 Suggested style improvements
-We are working on framework-side change to fix this issue. For example, the below plot shows the decode parallelization mapping sweep results for H100 for deepseek-ai/DeepSeek-R1-Distill-Llama-8B.
+We are working on framework-side change to resolve this issue. For example, the below plot shows the decode parallelization mapping sweep results for H100 for deepseek-ai/DeepSeek-R1-Distill-Llama-8B.
-4. **Recommendation**: Selects optimal parallelization mapping for prefill and decode that achieves the highest per GPU throughput while adhering the SLA on TTFT and ITL. Specifically, the profiler will choose the point (or a point on the curve for decode) that is left to the vertical red dashed line that represents the SLAs while has the highest y coordinate (throughput per GPU).
+4. **Recommendation**: Selects optimal parallelization mapping for prefill and decode that achieves the highest per-GPU throughput while adhering the SLA on TTFT and ITL. Specifically, the profiler will choose the point (or a point on the curve for decode) that is left to the vertical red dashed line that represents the SLAs while has the highest y-coordinate (throughput per GPU).
-The active kv usage determines the complexity of the memory-bounded attention kernel while the active kv usage divided the average context length determines the complexity of the computation bound MLP kernel.
+The active kv usage determines the complexity of the memory-bounded attention kernel while the active kv usage divided the average context length determines the complexity of the computation-bound MLP kernel.

Also applies to: 108-108, 112-112

fern/fern/pages/benchmarks/kv-router-ab-testing.mdx (1)

797-801: Consider clarifying that files are user-created, not provided.

The appendix references files like prepare-dataset.sh and "Results CSVs" that aren't provided in the guide. Users create these locally by following the steps. Consider adding a brief note clarifying this, or removing references to files that are implicitly created through the guide's commands.

📝 Suggested clarification
 ## Appendix: Files Reference
 
-- `router-off-deployment.yaml`: Standard routing deployment
-- `router-on-deployment.yaml`: KV router enabled deployment
-- `benchmark-job.yaml`: AIPerf benchmark pod
-- `prepare-dataset.sh`: Dataset preparation script
-- Results CSVs: Detailed metrics from AIPerf
+**Files you create during this guide:**
+- `router-off-deployment.yaml`: Standard routing deployment
+- `router-on-deployment.yaml`: KV router enabled deployment  
+- `benchmark-job.yaml`: AIPerf benchmark pod
+- `mooncake_trace_4x.jsonl`: Prepared benchmark dataset
+- `router_off_results.csv` / `router_on_results.csv`: AIPerf output metrics
fern/fern/pages/kvbm/trtllm-setup.mdx (1)

100-100: Minor grammatical issue: missing subject.

The sentence "Alternatively, can use..." is missing a subject.

Suggested fix
-Alternatively, can use "trtllm-serve" with KVBM by replacing the above two [DYNAMO] cmds with below:
+Alternatively, you can use `trtllm-serve` with KVBM by replacing the above two [DYNAMO] commands with the following:
fern/fern/pages/kvbm/kvbm_integrations.mdx (2)

32-33: Consider escaping or spacing the ampersand for reliable rendering.

Host&Disk may render inconsistently across Markdown processors. Using Host & Disk or Host and Disk would be safer.

Suggested fix
-![Offloading blocks from Device to Host&Disk](../../assets/img/kvbm-offload.png)
-**Offloading blocks from Device to Host&Disk**
+![Offloading blocks from Device to Host & Disk](../../assets/img/kvbm-offload.png)
+**Offloading blocks from Device to Host & Disk**

10-11: Consider breaking up dense paragraph for readability.

Lines 10-11 contain a lot of information in a single block. For documentation clarity, consider using bullet points or splitting into multiple paragraphs to separate the Scheduler and Worker component descriptions.

fern/fern/pages/kvbm/vllm-setup.mdx (1)

90-90: Minor grammatical issue: missing subject (same as trtllm-setup).

Suggested fix
-Alternatively, can use `vllm serve` directly to use KVBM for aggregated serving:
+Alternatively, you can use `vllm serve` directly to use KVBM for aggregated serving:
fern/fern/pages/backends/vllm/prometheus.mdx (1)

14-20: Optional: consolidate repeated “For …” sentences into a short “See also” list (Line 14–Line 20).
Helps flow without changing meaning.

♻️ Suggested refactor
-**For the complete and authoritative list of all vLLM metrics**, always refer to the [official vLLM Metrics Design documentation](https://docs.vllm.ai/en/latest/design/metrics.html).
-
-**For LMCache metrics and integration**, see the [LMCache Integration Guide](/components/kvbm/lm-cache-integration).
-
-**For Dynamo runtime metrics**, see the [Dynamo Metrics Guide](/user-guides/observability-local/metrics).
-
-**For visualization setup instructions**, see the [Prometheus and Grafana Setup Guide](/user-guides/observability-local/prometheus-grafana-setup).
+**See also:**
+- [vLLM Metrics Design Documentation](https://docs.vllm.ai/en/latest/design/metrics.html)
+- [LMCache Integration Guide](/components/kvbm/lm-cache-integration)
+- [Dynamo Metrics Guide](/user-guides/observability-local/metrics)
+- [Prometheus and Grafana Setup Guide](/user-guides/observability-local/prometheus-grafana-setup)
fern/fern/pages/backends/sglang/README.mdx (1)

138-138: Minor style suggestion: simplify phrasing.

Consider simplifying "in order to support" to "to support" for conciseness.

✏️ Suggested change
-We are in the process of shipping pre-built docker containers that contain installations of DeepEP, DeepGEMM, and NVSHMEM in order to support WideEP and P/D. For now, you can quickly build the container from source with the following command.
+We are in the process of shipping pre-built docker containers that contain installations of DeepEP, DeepGEMM, and NVSHMEM to support WideEP and P/D. For now, you can quickly build the container from source with the following command.
fern/fern/pages/kubernetes/grove.mdx (1)

99-105: Consider varying sentence structure for better flow.

Three consecutive sentences begin with "For". Consider rewording for variety.

✏️ Suggested revision
-For KAI Scheduler, see the [KAI Scheduler Deployment Guide](https://github.com/NVIDIA/KAI-Scheduler).
+See the [KAI Scheduler Deployment Guide](https://github.com/NVIDIA/KAI-Scheduler) for installation instructions.

-For installation instructions, see the [Grove Installation Guide](https://github.com/NVIDIA/grove/blob/main/docs/installation.md).
+The [Grove Installation Guide](https://github.com/NVIDIA/grove/blob/main/docs/installation.md) provides detailed setup steps.

-For practical examples of Grove-based multinode deployments in action, see the [Multinode Deployment Guide](/kubernetes-deployment/multinode/multinode-deployments), which demonstrates multi-node disaggregated serving scenarios.
+Practical examples of Grove-based multinode deployments are available in the [Multinode Deployment Guide](/kubernetes-deployment/multinode/multinode-deployments), which demonstrates multi-node disaggregated serving scenarios.
fern/fern/pages/kubernetes/deployment/multinode-deployment.mdx (1)

147-155: Minor inconsistency in YAML value formatting.

At line 139, nodeCount: 2 is shown without quotes, but at line 154, the examples show multinode.nodeCount: "2" with quotes. While both may work, the documentation should be consistent about whether these are integer or string values.

fern/fern/pages/multimodal/sglang.mdx (1)

72-74: GitHub source links use blob paths instead of tree paths.

The links to source files use /tree/main/ which is typically for directories. For individual files, GitHub expects /blob/main/. However, GitHub auto-redirects tree→blob for files, so this works but is not canonical.

🔧 Optional: Use canonical blob paths for file links
- - [MultimodalEncodeWorkerHandler](https://github.com/ai-dynamo/dynamo/tree/main/components/src/dynamo/sglang/request_handlers/multimodal/encode_worker_handler.py) for encoding
- - [MultimodalWorkerHandler](https://github.com/ai-dynamo/dynamo/tree/main/components/src/dynamo/sglang/request_handlers/multimodal/worker_handler.py) for prefilling and decoding.
- - [MultimodalProcessorHandler](https://github.com/ai-dynamo/dynamo/tree/main/components/src/dynamo/sglang/request_handlers/multimodal/processor_handler.py)
+ - [MultimodalEncodeWorkerHandler](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/sglang/request_handlers/multimodal/encode_worker_handler.py) for encoding
+ - [MultimodalWorkerHandler](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/sglang/request_handlers/multimodal/worker_handler.py) for prefilling and decoding.
+ - [MultimodalProcessorHandler](https://github.com/ai-dynamo/dynamo/blob/main/components/src/dynamo/sglang/request_handlers/multimodal/processor_handler.py)
fern/fern/pages/api/nixl_connect/device_kind.mdx (1)

24-33: Consider adding ReadableOperation to related classes.

The Related Classes section lists WritableOperation but not ReadableOperation. Based on the AI summary mentioning both ReadableOperation and WritableOperation as part of the NIXL Connect API, consider adding ReadableOperation for completeness.

Suggested addition
   - [RdmaMetadata](/additional-resources/api-reference/nixl-connect/rdma-metadata)
   - [ReadOperation](/additional-resources/api-reference/nixl-connect/read-operation)
+  - [ReadableOperation](/additional-resources/api-reference/nixl-connect/readable-operation)
   - [WritableOperation](/additional-resources/api-reference/nixl-connect/writable-operation)
   - [WriteOperation](/additional-resources/api-reference/nixl-connect/write-operation)
fern/fern/pages/api/nixl_connect/connector.mdx (1)

13-13: Tighten the GPU Direct RDMA sentence for readability.

✏️ Suggested rewording
-This class provides a "pythonic" interface using NIXL library to utilize GPU Direct RDMA accelerated, when available, data transfers between models hosted by different workers in a Dynamo graph.
+This class provides a "pythonic" interface to the NIXL library for GPU Direct RDMA–accelerated data transfers (when available) between models hosted by different workers in a Dynamo graph.
fern/fern/pages/kubernetes/installation_guide.mdx (1)

335-338: Minor: Capitalize "just" at start of sentence.

-just add the following to the helm install command:
+Just add the following to the helm install command:

@Jont828 Jont828 force-pushed the fern-migration-parallel branch from 902f206 to a653d66 Compare January 16, 2026 02:35
@Jont828
Copy link
Contributor Author

Jont828 commented Jan 16, 2026

@grahamking I think I got the relative links working, so the docs will still work in VSCode/GitHub. Shouldn't be any breaking changes.

@Jont828
Copy link
Contributor Author

Jont828 commented Jan 16, 2026

@coderabbitai Review

@Jont828 Jont828 force-pushed the fern-migration-parallel branch from bc28d24 to 318e06a Compare January 16, 2026 03:47
@nealvaidya
Copy link
Contributor

/ok to test 2863687

Jont828 and others added 7 commits January 23, 2026 16:15
Signed-off-by: Jont828 <jt572@cornell.edu>
Signed-off-by: Jont828 <jt572@cornell.edu>
Signed-off-by: Jont828 <jt572@cornell.edu>
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Signed-off-by: Jont828 <jt572@cornell.edu>
Signed-off-by: Jont828 <jt572@cornell.edu>
@nealvaidya nealvaidya force-pushed the fern-migration-parallel branch from 2863687 to db98ba4 Compare January 24, 2026 00:15
@github-actions github-actions bot added ci Issues/PRs that reference CI build/test and removed ci Issues/PRs that reference CI build/test labels Jan 24, 2026
@nealvaidya
Copy link
Contributor

/ok to test db98ba4

@dagil-nvidia
Copy link
Collaborator

This is great - my biggest questions are more operational:

  1. How quickly can we deprecate Sphinx and move everything over to Fern
  2. If we are going to be in an in-between state, how do we handle docs updates during that time?

@Jont828
Copy link
Contributor Author

Jont828 commented Jan 26, 2026

@dagil-nvidia Great questions!

  1. That's more dependent on your guys' end. We'd need to get a Fern account/subscription set up under NVIDIA (currently I just made my own account) and set up the credentials in the repo to push to the live site. Then, I'm assuming you'd need to change docs.nvidia.com/dynamo to point to Fern. From my end, everything else is ready.
  2. In the meantime, I think we'd just need to ask incoming PRs with doc changes to also update on Fern as well. I can help to keep it up to date and when we're going to deprecate Sphinx, we can sync up the docs again in case we missed something.

@github-actions github-actions bot removed the ci Issues/PRs that reference CI build/test label Jan 26, 2026
@nealvaidya nealvaidya enabled auto-merge (squash) January 26, 2026 18:29
@github-actions github-actions bot added the ci Issues/PRs that reference CI build/test label Jan 26, 2026
@nealvaidya
Copy link
Contributor

/ok to test 16dae4f

@nealvaidya nealvaidya merged commit f9050aa into ai-dynamo:main Jan 26, 2026
46 of 55 checks passed
alec-flowers pushed a commit that referenced this pull request Jan 27, 2026
Signed-off-by: Jont828 <jt572@cornell.edu>
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Co-authored-by: Neal Vaidya <nealv@nvidia.com>
soodoshll pushed a commit to soodoshll/dynamo that referenced this pull request Feb 12, 2026
Signed-off-by: Jont828 <jt572@cornell.edu>
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Co-authored-by: Neal Vaidya <nealv@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci Issues/PRs that reference CI build/test docs documentation Improvements or additions to documentation external-contribution Pull request is from an external contributor size/XXL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants