Release NVIDIA BioNeMo Framework v2.6.2 · NVIDIA/bionemo-framework

Updates & Improvements

Fixes numerous ESM2 model issues:
1. Finetuning metric for token classification is fixed. #946
2. Losses for finetuning were fixed for data and model parallelism. #959
3. Bug in inference script that concerns checkpoint loading is fixed. #950
Updated base Docker image to nvidia-pytorch 25.04-py3

Known Issues

Evo2 generation is broken (i.e. bionemo-evo2/src/bionemo/evo2/run/infer.py). See issue #890. A workaround exists on branch #949 and we are working to fix this issue for the July release.
There is a NCCL communication issue on certain A100 multi-node environments. In our internal testing, we were not able to reproduce the issue reliably across environments. If end users see the following error, please report in issue #970 :

[rank9]: torch.distributed.DistBackendError: NCCL error in: /opt/pytorch/pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:3356, internal error - please report this issue to the NCCL developers, NCCL version 2.26.3

What's Changed

Release notes v2.6 by @trvachov in #849
Bump to version 2.6 by @trvachov in #852
multi-gpu inference. Adds 'batch index' to the resulting prediction by @skothenhill-nv in #854
pin ngcsdk by @pstjohn in #857
fix Evo2 training crash - TE commit by @dorotat-nv in #796
Update EVO2 tests according to Hyena arch changes by @farhadrgh in #798
fixing the ESM2 checkpointing issue by @polinabinder1 in #842
add wandb group and model size in Geneformer configs: benchmarks by @dorotat-nv in #859
Skip d3pm notebook tests on B200 by @nvdreidenbach in #860
Bump NeMo to use a trunk commit instead of a branch for Evo2 fixes and inference. by @cspades in #861
remove unused dependencies from bionemo-core by @pstjohn in #862
Adding a tflops callback to Geneformer by @polinabinder1 in #856
Polinabinder/file extend by @polinabinder1 in #477
Geneformer1B updates by @skothenhill-nv in #869
upgrade pytorch to 25.04 by @balvisio in #866
Set EXPERIMENTAL_1b_CHECKPOINT to True by default by @jwilber in #840
Add Tyler to CODEOWNERS for docs by @jwilber in #880
Fix missing context in ESM2 FT checkpoint by @farhadrgh in #878
Fix CI issues on main branch. by @cspades in #868
Update and separate cell type classification benchmark by @skothenhill-nv in #874
[BIONEMO-1831] Fix the version of scikit-misc to resolve dependency issue by @balvisio in #883
Jwilber/1413 unify nb locations by @jwilber in #879
Fixes how num_layers relates to pipeline_model_parallel_size in ESM2 by @gagank1 in #829
Fix broken links and add banner by @jwilber in #891
jwilber/amplify automated benchmarks by @jwilber in #875
Add small section mentioning context extension by @jwilber in #837
Add prediction_interval in call to infer_model in infer_esm2.py by @gagank1 in #893
add flag for loading a sanity-sized dataset for AMPLIFY by @pstjohn in #899
Fix bionemo.llm.lightning.batch_collator in multi-GPU case by @gagank1 in #898
fixing vulnerabilities: setuptools and tornado by @dorotat-nv in #902
Add assertion to zeroshot notebook that the AUC is above a threshold by @jstjohn in #905
Add 2.6.1 release notes by @jwilber in #912
add chatbot ui to docs by @jwilber in #845
Fix masked token loss reductions by @skothenhill-nv in #900
Final documentation edits by @lvojtku in #894
Turn on chatbot visibility by default by @jwilber in #915
fix typo in pretrain.md by @pstjohn in #909
Add esm2 checkpoint export by @pstjohn in #918
Geneformer gene embeddings calculation. Now limits the changes to bionemo.geneformer only by @jyin-bst in #808
Remove strict comparison of tensors against golden values in evo2 test by @balvisio in #901
Remove cache-to and cache-from in devcontainer by @pstjohn in #913
Disable moco notebook tests to fix CI by @trvachov in #924
Fix broken image links in cellgene by @jwilber in #923
Add cli interface for esm2 checkpoint conversion by @pstjohn in #922
Reduce number of training steps for partial-conv: esm2 by @dorotat-nv in #929
Add create_tensorboard_logger argument to train_geneformer entrypoint by @nvmvle in #911
docs: adds an explanation for the trainer.global step oscillations by @jomitchellnv in #930
Fixing the missing tfevents dir and catching the issue in testing by @jstjohn in #926
Add option for number of constant steps of learning rate. by @Sohn123 in #907
move notebook exclusion to pyproject.toml by @dorotat-nv in #936
[BIONEMO-2042] Install 'bitsandbytes' with cuda backend by @balvisio in #932
evo2 stop and go test by @yzhang123 in #903
Update evo2 ModelCheckpoint args by @jwilber in #935
test stage specific run_pytest* files to standardise how tests are run in CIs by @dorotat-nv in #889
Cye/fix test pypi publish by @cspades in #947
Fix Geneformer test_load_data_run_benchmark by @gagank1 in #942
Fix esm2 token classification metric and loss, add flip benchmark by @yzhang123 in #946
fix bug that didn't load the head by @yzhang123 in #950
Polinabinder/scdl version fixes by @polinabinder1 in #948
expose esm-2 weight decay parameter by @pstjohn in #956
Update CODEOWNERS by @malcolmgreaves in #951
Replace shell commands with subprocess.run by @balvisio in #941
updating esm2 + geneformer to run benchmarks with data from node specific scratch by @dorotat-nv in #957
Add guard against zero masked tokens in loss reduction class. by @skothenhill-nv in #958
scdl neighbor update by @camirr-nv in #843
Fix esm2 finetune loss by @yzhang123 in #959

New Contributors

@lvojtku made their first contribution in #894
@jyin-bst made their first contribution in #808
@nvmvle made their first contribution in #911
@Sohn123 made their first contribution in #907

Full Changelog: v2.6.1...v2.6.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA BioNeMo Framework v2.6.2

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Updates & Improvements

Known Issues

What's Changed

New Contributors

Contributors

Uh oh!