Skip to content

Commit 0748173

Browse files
committed
addressing coderabbit review
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
1 parent 773d581 commit 0748173

File tree

17 files changed

+64
-28
lines changed

17 files changed

+64
-28
lines changed

CLAUDE.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,9 @@ pre-commit run --all-files
4242
pre-commit run
4343
```
4444

45+
Do not copy license headers from other files, instead allow the license-check.py script to add the license header during
46+
pre-commit to ensure the proper year is used.
47+
4548
Pre-commit includes:
4649

4750
- Ruff linting/formatting (line-length: 119, Google-style docstrings)

bionemo-recipes/models/esm2/collator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -842,7 +842,7 @@ def _process_tensor_bshd(
842842
total_chunks = 2 * cp_world_size
843843
chunk_size = seq_len // total_chunks
844844

845-
if chunk_size == 0:
845+
if seq_len % total_chunks != 0:
846846
raise ValueError(
847847
f"Sequence length {seq_len} must be divisible by {total_chunks} "
848848
f"(2 * cp_world_size) for BSHD context parallelism"

bionemo-recipes/models/esm2/tests/common/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,8 @@
1818
This package provides reusable test infrastructure following HuggingFace
1919
transformers patterns, including:
2020
21-
- BaseModelTest: Base test class with all common test methods - TestTolerances: Dataclass for model-specific numerical tolerances
21+
- BaseModelTest: Base test class with all common test methods
22+
- TestTolerances: Dataclass for model-specific numerical tolerances
2223
- Distributed testing utilities for multi-GPU tests
2324
- Shared fixtures for common test requirements
2425

bionemo-recipes/models/llama3/collator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -842,7 +842,7 @@ def _process_tensor_bshd(
842842
total_chunks = 2 * cp_world_size
843843
chunk_size = seq_len // total_chunks
844844

845-
if chunk_size == 0:
845+
if seq_len % total_chunks != 0:
846846
raise ValueError(
847847
f"Sequence length {seq_len} must be divisible by {total_chunks} "
848848
f"(2 * cp_world_size) for BSHD context parallelism"

bionemo-recipes/models/llama3/tests/common/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,8 @@
1818
This package provides reusable test infrastructure following HuggingFace
1919
transformers patterns, including:
2020
21-
- BaseModelTest: Base test class with all common test methods - TestTolerances: Dataclass for model-specific numerical tolerances
21+
- BaseModelTest: Base test class with all common test methods
22+
- TestTolerances: Dataclass for model-specific numerical tolerances
2223
- Distributed testing utilities for multi-GPU tests
2324
- Shared fixtures for common test requirements
2425

bionemo-recipes/models/mixtral/README.md

Lines changed: 34 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,9 @@ The Mixtral implementation natively supports the following TransformerEngine-pro
2222

2323
### Quick start: convert and run
2424

25+
> **Note:** The snippets below use bare imports (e.g., `from convert import ...`). Run them from the
26+
> `bionemo-recipes/models/mixtral` directory, or install dependencies first with `pip install -r requirements.txt`.
27+
2528
```python
2629
import torch
2730
from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -44,7 +47,7 @@ inputs = tokenizer("The quick brown fox", return_tensors="pt")
4447
inputs = {k: v.to("cuda") for k, v in inputs.items()}
4548

4649
with torch.no_grad():
47-
output_ids = model_te.generate(**inputs, max_new_tokens=16, use_cache=False)
50+
output_ids = model_te.generate(**inputs, max_new_tokens=16)
4851

4952
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
5053
```
@@ -57,6 +60,9 @@ inference, and back to Hugging Face Transformers format for sharing and deployme
5760

5861
### Converting from HF Transformers to TE
5962

63+
> **Note:** Run from the `bionemo-recipes/models/mixtral` directory, or install dependencies first with
64+
> `pip install -r requirements.txt`.
65+
6066
```python
6167
from transformers import AutoModelForCausalLM
6268

@@ -69,6 +75,9 @@ model_te.save_pretrained("/path/to/te_checkpoint")
6975

7076
### Converting from TE back to HF Transformers
7177

78+
> **Note:** Run from the `bionemo-recipes/models/mixtral` directory, or install dependencies first with
79+
> `pip install -r requirements.txt`.
80+
7281
```python
7382
from convert import convert_mixtral_te_to_hf
7483
from modeling_mixtral_te import NVMixtralForCausalLM
@@ -80,9 +89,18 @@ model_hf.save_pretrained("/path/to/hf_checkpoint")
8089

8190
### Validating Converted Models
8291

83-
To validate the converted models, refer to the commands in [Inference Examples](#inference-examples) above to load and
84-
test both the original and converted models to ensure loss and logit values are similar. Additionally, refer to the
85-
golden value tests in [test_modeling_mixtral.py](tests/test_modeling_mixtral.py).
92+
The golden value tests in [test_modeling_mixtral.py](tests/test_modeling_mixtral.py) verify that the converted TE model
93+
produces numerically equivalent outputs to the original HuggingFace model. Specifically:
94+
95+
- `test_golden_values_bshd` — loads both models, runs a forward pass on the same input, and asserts that logits and
96+
loss match within tolerance.
97+
- `test_round_trip_conversion` — converts HF → TE → HF and verifies the round-tripped model produces identical outputs.
98+
99+
To run these tests locally:
100+
101+
```bash
102+
./ci/scripts/recipes_local_test.py bionemo-recipes/models/mixtral/
103+
```
86104

87105
## Developer Guide
88106

@@ -94,6 +112,18 @@ To run tests locally, run `recipes_local_test.py` from the repository root with
94112
./ci/scripts/recipes_local_test.py bionemo-recipes/models/mixtral/
95113
```
96114

115+
### Exporting to Hugging Face Hub
116+
117+
The model directory includes an `export.py` script that bundles all files needed for Hugging Face Hub distribution. To
118+
create the export bundle, run from the model directory:
119+
120+
```bash
121+
python export.py
122+
```
123+
124+
Before publishing, validate the export by running the local test suite via
125+
[recipes_local_test.py](../../ci/scripts/recipes_local_test.py).
126+
97127
### Development container
98128

99129
To use the provided devcontainer, use "Dev Containers: Reopen in Container" from the VSCode menu, and choose the

bionemo-recipes/models/mixtral/collator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -842,7 +842,7 @@ def _process_tensor_bshd(
842842
total_chunks = 2 * cp_world_size
843843
chunk_size = seq_len // total_chunks
844844

845-
if chunk_size == 0:
845+
if seq_len % total_chunks != 0:
846846
raise ValueError(
847847
f"Sequence length {seq_len} must be divisible by {total_chunks} "
848848
f"(2 * cp_world_size) for BSHD context parallelism"

bionemo-recipes/models/mixtral/convert.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,13 @@ def _make_merge_experts_fn(num_experts: int):
7171
7272
Since the number of experts is dynamic (varies per model config), we use ``exec()`` to generate
7373
a function with exactly ``num_experts`` named parameters (weight0, weight1, ..., weightN-1).
74+
75+
Args:
76+
num_experts: The number of expert weight parameters the generated function will accept.
77+
78+
Returns:
79+
A callable ``(weight0, weight1, ..., weight{N-1}) -> torch.Tensor`` that stacks the
80+
per-expert weight tensors into a single tensor of shape ``[num_experts, ...]``.
7481
"""
7582
param_names = [f"weight{i}" for i in range(num_experts)]
7683
code = f"def merge_experts({', '.join(param_names)}):\n return torch.stack([{', '.join(param_names)}])"

bionemo-recipes/models/mixtral/tests/common/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,8 @@
1818
This package provides reusable test infrastructure following HuggingFace
1919
transformers patterns, including:
2020
21-
- BaseModelTest: Base test class with all common test methods - TestTolerances: Dataclass for model-specific numerical tolerances
21+
- BaseModelTest: Base test class with all common test methods
22+
- TestTolerances: Dataclass for model-specific numerical tolerances
2223
- Distributed testing utilities for multi-GPU tests
2324
- Shared fixtures for common test requirements
2425

bionemo-recipes/models/mixtral/tests/test_export.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
# SPDX-License-Identifier: LicenseRef-Apache2
33
#
44
# Licensed under the Apache License, Version 2.0 (the "License");

0 commit comments

Comments
 (0)