Releases: NVIDIA/bionemo-framework
NVIDIA BioNeMo Framework v2.7
Updates & Improvements
-
Evo2 model improvements:
-
Context, tensor and data parallelism support in the prediction endpoint as well as support for context lengths over 8192 #1123. Fixes #910 and #1048.
-
LoRA fine-tuning by @gabenavarro: #980. Note: internal CI coverage of LoRA convergence is still a work in progress; therefore, we cannot guarantee convergence.
-
Fix a 2x memory-usage issue during Evo2 generation: NVIDIA-NeMo/NeMo#14515
-
Add flash-decode support in inference: #1000
-
Update Rotary Embedding and sequence-length defaults to address incorrect checkpoint conversion: NVIDIA-NeMo/NeMo#14514
-
Improvements to tag masking in the Evo2 loss: #1008
-
Support for Spike-no-more to improve training stability: #1011
-
-
Added a header to SCDL archives, providing improved provenance tracking and supporting future releases. It also adds tracking of AnnData API coverage in SCDL tests.
This header stores metadata about the archive and its composite arrays, including a version; the array lengths and data types; and information about the RowFeatureIndexes. This adds the features necessary to fix #999 as well as to implement simple bit-packing of the rowptr, colptr, and data arrays. It should also make SCDL more secure, enable strict compatibility checking, and open the door to further performance improvements: #1030 -
bionemo-geometrichas been deprecated and removed. The molecular-featurization tooling in this package has moved to cuik-molmaker.
Known Issues
- We have removed
libtifffrom the container due to a known vulnerability, CVE-2025-9900.libtiffisn't directly used in any BioNeMo code; however, users might face issues with e.g. Pillow or other common image-manipulation libraries inside this container.
What's Changed
- fix esm2 finetuning tutorial by @yzhang123 in #1002
- Better masking in evo2, and default use of more robust megatron loss by @jstjohn in #1008
- aws-cli install refactor in Docker container, version bump by @trvachov in #1001
- Add ESM2 Finetuning Benchmark Configuration by @nvmvle in #964
- Add README and SpeedTest. by @edawson in #1005
- update codeowners by @yzhang123 in #1009
- update evo2 zero shot notebook by @yzhang123 in #1003
- Hyena Inference Updates to support Flash Decode by @jstjohn in #1000
- Bump NeMo by @farhadrgh in #1012
- feat: added lora to evo2 by @gabenavarro in #980
- Code for general single cell benchmarking by @polinabinder1 in #969
- Evo2 spike-no-more support by @jstjohn in #1011
- Fix typos in ESM2 finetune notebook by @balvisio in #1023
- Amoradza/rearrange by @moradza in #1025
- updating setup tools in pyproject.toml of scspeedtest by @polinabinder1 in #1033
- Clean up repo root and unused internal tooling by @pstjohn in #1037
- run mdformat and bump ruff pre-commit by @pstjohn in #1036
- Add validation tests for ESM2 fine-tuning benchmark partial-conv by @nvmvle in #1010
- Add subquadratic-ops support by @farhadrgh in #1043
- revert formatting of docs page by @pstjohn in #1044
- Refactor license check script by @pstjohn in #1045
- fix confidence MDLM sampling and add ar and margin by @nvdreidenbach in #1029
- Edawson/scdl schema by @edawson in #1030
- Entropic time scheduler by @btrentini in #1024
- add changed files to github workflow by @pstjohn in #1038
- make sure framework ci runs on schedule events by @pstjohn in #1059
- Bionemo Core version update by @polinabinder1 in #1050
- remove ./internal copy in Dockerfile by @pstjohn in #1063
- update license check and ignore .gitignore in framework ci by @pstjohn in #1062
- enable per output token likelihood prediction for evo2 by @yzhang123 in #1057
- remove extra -- from pipeline parallel command by @jwilber in #1067
- Add bionemo-recipes by @pstjohn in #1052
- remove esm2 native recipe by @pstjohn in #1068
- Update README.md by @taras-sereda in #1070
- unskip evo2 tests by @broland-hat in #1058
- add some fixes for recipes ci by @pstjohn in #1072
- add bionemo-recipes CI summary job by @pstjohn in #1073
- checkout submodules in framework changed-actions by @pstjohn in #1074
- Remove index.md markdown to prevent 3.13 requirement by @jwilber in #1076
- skip get-pr-info if we're not on a pull request branch by @pstjohn in #1079
- Release v2.7rc1 by @yzhang123 in #1056
- updating esm2 native recipe by @pstjohn in #1078
- Change NeMo to new organization NVIDIA-NeMo. by @cspades in #1089
- ESM-2 Accelerate Recipes by @pstjohn in #1080
- temporarily comment out failing checkpoint by @jwilber in #1093
- feat: Add Evo2 fine-tuning partial-conv benchmarking by @nvmvle in #1028
- pin transformers in esm2 golden values by @pstjohn in #1100
- Add ESM-2 model gradient tests by @pstjohn in #1077
- Add initial model convergence workflow for BioNeMo tests by @jwilber in #1102
- ESM-2 mfsdp recipe expanded tests by @pstjohn in #1101
- bump NeMo by @farhadrgh in #1105
- Fix infer_evo2 argparse error. by @jstjohn in #1110
- moving FAST_CI_MODE flag to pytest script by @dorotat-nv in #1111
- add emacs-nox [bnm2717] by @broland-hat in #1107
- run all recipes on a single node by @pstjohn in #1108
- fix memory error on nightly a bnm2712 by @broland-hat in #1104
- pin torch version in amplify by @pstjohn in #1112
- with zstd compression lets just run the steps individually by @pstjohn in #1117
- only xfail thd tests if cuda arch is unsupported by @pstjohn in #1118
- Update model name in L1_3B_ddp.yaml by @jwilber in #1119
- Add wandb mode arg by @jwilber in #1120
- adding shared_eden_dataloader. by @yzhang123 in #1109
- [BENCHMAKRS] fix jet evo2 pretrain for partial conv by @dorotat-nv in #1115
- add partial conv tests to esm2_accelerate recipe by @pstjohn in #1122
- make sure we initialize accelerator before model by @pstjohn in #1132
- Optimize load() by avoiding redundant hash checks and unpacking by @antonvnv in #1081
- [BENCHMARKS] fixing evo2 finetune config by @dorotat-nv in #1097
- rename nvfsdp to mfsdp globally by @pstjohn in #1137
- Moving bionemo-core loads to bionemo-scdl, so t...
NVIDIA BioNeMo Framework v2.6.3
Updates & Improvements
- Fixes numerous issues with Evo2 model:
- ESM2 LoRA model inference issue resolved. #996
- Added experimental evo2-mamba model. #888
- Updated base Docker image to nvidia-pytorch 25.06-py3
- NCCL issue in ESM2 pretraing resolved. #970
What's Changed
- Fix test_train_evo2_stops test by @balvisio in #965
- Enable test_train_evo2_stop_at_max_steps_and_continue. by @balvisio in #966
- automated benchmarks: esm2 650M training analogous to bionemo-recipes by @dorotat-nv in #975
- Fix database path in esm2_pretrain_recipes by @pstjohn in #978
- Add fp8 stop and go test for evo2 by @jwilber in #974
- Update Docs Banner for GitHub Pages-hosted Docs by @tshimko-nv in #981
- Add release notes for v2.6.2 (25.06) by @trvachov in #971
- Evo2 Generation fixes and necessary base dependency and container updates. Large change. by @jwilber in #949
- Point NeMo submodule back to main repo by @trvachov in #984
- Use new b2b kernels in evo2 jet tests by @jwilber in #985
- change where dtype is found in checkpoint export by @pstjohn in #989
- Evo2 Mamba by @jstjohn in #888
- Adding inference CDS length tests by @jstjohn in #991
- Fix PIL CVE by @trvachov in #992
- [BIONEMO-2334] Patch TE to fix Evo2 stop and go training by @balvisio in #987
- Fix bug in evo2-mamba train and add test by @jstjohn in #994
- Fix esm2 lora inference by @yzhang123 in #996
- Reset parameters for the ESM-2 contact head on HF export by @pstjohn in #983
Full Changelog: v2.6.2...v2.6.3
NVIDIA BioNeMo Framework v2.6.2
Updates & Improvements
- Fixes numerous ESM2 model issues:
- Updated base Docker image to nvidia-pytorch 25.04-py3
Known Issues
- Evo2 generation is broken (i.e.
bionemo-evo2/src/bionemo/evo2/run/infer.py). See issue #890. A workaround exists on branch #949 and we are working to fix this issue for the July release. - There is a NCCL communication issue on certain A100 multi-node environments. In our internal testing, we were not able to reproduce the issue reliably across environments. If end users see the following error, please report in issue #970 :
[rank9]: torch.distributed.DistBackendError: NCCL error in: /opt/pytorch/pytorch/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:3356, internal error - please report this issue to the NCCL developers, NCCL version 2.26.3
What's Changed
- Release notes v2.6 by @trvachov in #849
- Bump to version 2.6 by @trvachov in #852
- multi-gpu inference. Adds 'batch index' to the resulting prediction by @skothenhill-nv in #854
- pin ngcsdk by @pstjohn in #857
- fix Evo2 training crash - TE commit by @dorotat-nv in #796
- Update EVO2 tests according to Hyena arch changes by @farhadrgh in #798
- fixing the ESM2 checkpointing issue by @polinabinder1 in #842
- add wandb group and model size in Geneformer configs: benchmarks by @dorotat-nv in #859
- Skip d3pm notebook tests on B200 by @nvdreidenbach in #860
- Bump NeMo to use a trunk commit instead of a branch for Evo2 fixes and inference. by @cspades in #861
- remove unused dependencies from bionemo-core by @pstjohn in #862
- Adding a tflops callback to Geneformer by @polinabinder1 in #856
- Polinabinder/file extend by @polinabinder1 in #477
- Geneformer1B updates by @skothenhill-nv in #869
- upgrade pytorch to 25.04 by @balvisio in #866
- Set EXPERIMENTAL_1b_CHECKPOINT to True by default by @jwilber in #840
- Add Tyler to CODEOWNERS for docs by @jwilber in #880
- Fix missing context in ESM2 FT checkpoint by @farhadrgh in #878
- Fix CI issues on main branch. by @cspades in #868
- Update and separate cell type classification benchmark by @skothenhill-nv in #874
- [BIONEMO-1831] Fix the version of scikit-misc to resolve dependency issue by @balvisio in #883
- Jwilber/1413 unify nb locations by @jwilber in #879
- Fixes how num_layers relates to pipeline_model_parallel_size in ESM2 by @gagank1 in #829
- Fix broken links and add banner by @jwilber in #891
- jwilber/amplify automated benchmarks by @jwilber in #875
- Add small section mentioning context extension by @jwilber in #837
- Add prediction_interval in call to infer_model in infer_esm2.py by @gagank1 in #893
- add flag for loading a sanity-sized dataset for AMPLIFY by @pstjohn in #899
- Fix bionemo.llm.lightning.batch_collator in multi-GPU case by @gagank1 in #898
- fixing vulnerabilities: setuptools and tornado by @dorotat-nv in #902
- Add assertion to zeroshot notebook that the AUC is above a threshold by @jstjohn in #905
- Add 2.6.1 release notes by @jwilber in #912
- add chatbot ui to docs by @jwilber in #845
- Fix masked token loss reductions by @skothenhill-nv in #900
- Final documentation edits by @lvojtku in #894
- Turn on chatbot visibility by default by @jwilber in #915
- fix typo in pretrain.md by @pstjohn in #909
- Add esm2 checkpoint export by @pstjohn in #918
- Geneformer gene embeddings calculation. Now limits the changes to bionemo.geneformer only by @jyin-bst in #808
- Remove strict comparison of tensors against golden values in evo2 test by @balvisio in #901
- Remove cache-to and cache-from in devcontainer by @pstjohn in #913
- Disable moco notebook tests to fix CI by @trvachov in #924
- Fix broken image links in cellgene by @jwilber in #923
- Add cli interface for esm2 checkpoint conversion by @pstjohn in #922
- Reduce number of training steps for partial-conv: esm2 by @dorotat-nv in #929
- Add
create_tensorboard_loggerargument totrain_geneformerentrypoint by @nvmvle in #911 - docs: adds an explanation for the trainer.global step oscillations by @jomitchellnv in #930
- Fixing the missing tfevents dir and catching the issue in testing by @jstjohn in #926
- Add option for number of constant steps of learning rate. by @Sohn123 in #907
- move notebook exclusion to pyproject.toml by @dorotat-nv in #936
- [BIONEMO-2042] Install 'bitsandbytes' with cuda backend by @balvisio in #932
- evo2 stop and go test by @yzhang123 in #903
- Update evo2 ModelCheckpoint args by @jwilber in #935
- test stage specific run_pytest* files to standardise how tests are run in CIs by @dorotat-nv in #889
- Cye/fix test pypi publish by @cspades in #947
- Fix Geneformer test_load_data_run_benchmark by @gagank1 in #942
- Fix esm2 token classification metric and loss, add flip benchmark by @yzhang123 in #946
- fix bug that didn't load the head by @yzhang123 in #950
- Polinabinder/scdl version fixes by @polinabinder1 in #948
- expose esm-2 weight decay parameter by @pstjohn in #956
- Update CODEOWNERS by @malcolmgreaves in #951
- Replace shell commands with subprocess.run by @balvisio in #941
- updating esm2 + geneformer to run benchmarks with data from node specific scratch by @dorotat-nv in #957
- Add guard against zero masked tokens in loss reduction class. by @skothenhill-nv in #958
- scdl neighbor update by @camirr-nv in #843
- Fix esm2 finetune loss by @yzhang123 in #959
New Contributors
- @lvojtku made their first contribution in #894
- @jyin-bst made their first contribution in #808
- @nvmvle made their first contribution in #911
- @Sohn123 made their first contribution in #907
Full Changelog: v2.6.1...v2.6.2
NVIDIA BioNeMo Framework v2.6.1
Updates & Improvements
- Fixes around ESM2 pretraining and funetuning checkpoints.
- Added sanity dataset for AMPLIFY testing.
- Tested against A100 brev instances.
- Update
tornadopackage to>6.5.0to fix container CVEs.
Full Changelog: v2.6...v2.6.1
NVIDIA BioNeMo Framework v2.6
New Features
- Adds support for AMPLIFY doi:10.1101/2024.09.23.614603 pre-training and inference, offering a 70% speedup over the xformers-based attention backend with similar final perplexity values at 1M pre-training steps. (4.23 for 120M, 3.05 for 350M). The model is fully compatible with existing weights on HuggingFace.
- Adds alpha support for LoRA fine-tuning to for ESM2 models. Inference and fine-tuning are enabled along with resumption from a checkpoint.
Updates & Improvements
- Blackwell support, tested on B200 systems.
- Fixed Grace CPU support, released ARM compatible container.
What's Changed
- hotfix: docker build in CI by @dorotat-nv in #756
- updated version for 2.5 release by @dorotat-nv in #755
- update evo2 partial conv max steps by @dorotat-nv in #736
- bump ruff to 0.9.10 and reformat files as necessary by @pstjohn in #751
- Updated file paths for images in SCDL README by @polinabinder1 in #758
- hotfix CI: failing test test_train_evo2_stops by @dorotat-nv in #761
- adding v2.5 release notes by @dorotat-nv in #764
- switch to GHA runners by @pstjohn in #734
- Jwilber/update evo2 readme and assets by @jwilber in #759
- Aligning directories with tensorboard logs for ESM2 and Evo2 by @dorotat-nv in #740
- Remove Evo2 PR announcement now that everything is merged by @jstjohn in #772
- switch cache-from flags by @pstjohn in #773
- Add AMPLIFY model and huggingface conversion scripts by @pstjohn in #640
- parallelize test stages in GitHub CI by @dorotat-nv in #768
- D3pm blackwell testing stability fix by @nvdreidenbach in #743
- fix: attempts to update geneformer notebooks by @jomitchellnv in #745
- [hotfix] setting as "ignore" the failing notebook geneformer_cellxgene_tutorial.ipynb by @dorotat-nv in #779
- Remove xformers install by @pstjohn in #781
- Remove outdated evo2 tutorial that is now in the submodule by @jstjohn in #783
- Blackwell compatibility changes by @trvachov in #707
- Add AMPLIFY inference by @pstjohn in #775
- Pin griffe to 1.6.2 by @pstjohn in #789
- hotfix: evo2 divergence - downgrade TE to v1.13 by @dorotat-nv in #791
- Updates tflops chart for Geneformer. by @jomitchellnv in #785
- change checkpoint name pattern by @farhadrgh in #786
- Revert commit 67a869b (TE_VERSION=v1.13 fix) by @dorotat-nv in #795
- [cye/subpack-gpu-testing] Add GPU runner to testing job. by @cspades in #776
- Dockerfile improvements for ARM by @trvachov in #777
- Remove llama-index from container to fix CVEs by @trvachov in #800
- Bump 3rdparty/NeMo from
cc8ff45to384ff02by @dependabot in #792 - Add local clone script by @nvdreidenbach in #787
- Fix ARM docker build by @trvachov in #801
- [cye/ml-subpackage-ci] Onboard bionemo-llm and bionemo-noodles to the sub-package CI. by @cspades in #809
- Update README.md link by @nvdreidenbach in #812
- Have dependabot update our docker base image by @pstjohn in #813
- Add .codecov.yml status checks by @pstjohn in #618
- Add AMPLIFY model documentation, minor type fixes by @pstjohn in #788
- Remove import guard in bionemo-llm by @pstjohn in #804
- Bump rust from 1.82.0 to 1.86.0 by @dependabot in #819
- Bump crossbeam-channel from 0.5.13 to 0.5.15 in /sub-packages/bionemo-noodles by @dependabot in #818
- Pbinder/geneformer partial conv by @polinabinder1 in #802
- [cye/rapids-sc-install] Add rapids_singlecell import to BioNeMo FW container image. by @cspades in #816
- Biopharma mailing list docs addition. by @trvachov in #822
- unify the implementation of early training termination across BioNeMo subpackages and update benchmarks by @dorotat-nv in #803
- Fix bitsandbytes issue on ARM by @trvachov in #824
- Fixes for AMPLIFY QA scripts by @pstjohn in #825
- updated configs for benchmarks by @dorotat-nv in #833
- Remove temporary pins in docs build by @pstjohn in #828
- Adding baseline metrics for benchmarking ESM2 model by @ShevaNguyen in #831
- Updates docs for geneformer training, inference, and cellxclassification by @jomitchellnv in #823
- Add pre commit to verify test status by @pstjohn in #841
- fix geneformer image paths by @jomitchellnv in #839
- fix geneformer image links by @jomitchellnv in #844
- ESM2 PEFT by @polinabinder1 in #766
- Pbinder/esm2 document by @polinabinder1 in #846
- h11 CRIT vuln fix by @trvachov in #847
- Docs fix by @trvachov in #826
New Contributors
- @ShevaNguyen made their first contribution in #831
Full Changelog: v2.5...v2.6
NVIDIA BioNeMo Framework v2.5
New Features
- Adding the Evo2 model training workflow, including data preprocessing, pre-training, fine-tuning and inference with bf16 and fp8 support.
Updates & Improvements
- Supporting/upgrading federated learning examples of BioNeMo in NVFlare
- Upgrade bionemo-moco to v0.0.2
- Brev.dev launchable tutorials
What's Changed
- Bump 3rdparty/Megatron-LM from
2a9793dtoa0365bcby @dependabot in #692 - Bump 3rdparty/NeMo from
48f10aftoee28bc5by @dependabot in #693 - add announcement README.md by @ntadimeti in #695
- Adjust ESM2 fine-tuning to allow NVFlare usecases by @farhadrgh in #689
- Upgrade bionemo-moco to v0.0.2 by @nvdreidenbach in #688
- disable metric when model parallel by @sichu2023 in #701
- bump NeMo by @farhadrgh in #703
- split trufflehog scan into two actions, run on entire repo on scheduled event by @pstjohn in #696
- cve vulnerability on main by @dorotat-nv in #709
- move trufflehog scan to new action by @pstjohn in #721
- Pstjohn/trufflehog move action 2 by @pstjohn in #722
- Evo2 by @jstjohn in #694
- Trigger and skip trufflehog scan in merge group by @pstjohn in #728
- remove zstandard to address nvbug 5149698 by @pstjohn in #726
- JET for evo2: 1b model training by @dorotat-nv in #727
- If desired, training can be stopped on a specific step without impacting the LR curve. by @jstjohn in #739
- Cleanup any new files made by notebook tests by @jstjohn in #748
- Adding bf16 fine-tuned variant of evo2 1b checkpoint by @jstjohn in #747
- Bump nemo version to have the 1b checkpoint fix by @jstjohn in #729
- GTC Evo2 Demo Notebooks by @jwilber in #724
- [cye/subpack-ci] Add sub-package build, test, and publish to OSS. (WORK IN PROGRESS - PENDING MORE SUB-PACKAGE COVERAGE) by @cspades in #725
- Disable notebook and slow tests from running in merge queue by @pstjohn in #754
- fix: removes BIONEMO_HOME from repository [JIRA-BIONEMO-482] by @jomitchellnv in #742
- Update brev.dev badges to launchable built off main branch by @jwilber in #752
- Evo2 modelcard by @jstjohn in #746
- [cye/fix-subpack-ci] Fix bug where workflow dispatch collected packages are not passed to the next job. by @cspades in #753
- reduced mem to 12gb by @nvdreidenbach in #730
- Initial commits prepping for nv-gha-runners by @pstjohn in #733
- xfail evo2 long context train test by @dorotat-nv in #732
New Contributors
- @ntadimeti made their first contribution in #695
Full Changelog: v2.4.1...v2.5
NVIDIA BioNeMo Framework v2.4.1
What's Changed
Applies fixes to ESM2 metric logging that result in NotImplementedError while using Model Parallelism.
Full Changelog: v2.4...v2.4.1
NVIDIA BioNeMo Framework v2.4
New Features
- Draft implementation of Evo2 with support for Hyena operators
- bionemo-moco v0.0.1 released for building diffusion-like generative models.
Updates & Improvements
- ESM2 fine-tuning script with CLI (finetune_esm2) that supports sequence-level/token-level classification/regression using a CSV dataset.
- Brev.dev launchable fine-tuning tutorial for ESM2
What's Changed
- bump nemo and remove manual tensorstore install by @pstjohn in #619
- remove the apex and TE build steps from our docker container by @pstjohn in #611
- Adds bionemno-esm2 section to CODEOWNERS by @jomitchellnv in #627
- LR multiplier for ESM2 finetuning layers by @farhadrgh in #609
- fix perplexity logging by @sichu2023 in #622
- Allow finetuning ESM2 with [un]frozen encoder by @farhadrgh in #620
- ESM-2 to NeMo checkpoint conversion by @pstjohn in #537
- remove PerplexityCallback in pydantic api by @sichu2023 in #636
- catch ngc api key validation errors and default to not using an api key by @pstjohn in #635
- New approvals workflow by @pstjohn in #639
- 2.3 (25.01) release notes by @trvachov in #641
- short script to initialize environment for devcontainer by @pstjohn in #625
- Don't upload merge queue results to codecov by @pstjohn in #637
- Pin triton version to avoid import error by @pstjohn in #642
- fix devcontainer initialize script by @pstjohn in #648
- Fix geneformer notebook tests by removing 10m_bnmo2 model by @pstjohn in #649
- ignore labels in inference CSV data by @farhadrgh in #652
- Mark geneformer test_pretrain_cli as slow by @pstjohn in #651
- edit to approval workflow to avoid marking a failed action by @pstjohn in #650
- added slow test label and execute full testing suite before merge by @dorotat-nv in #634
- templates for bugs and feature requests by @dorotat-nv in #647
- Instructions for uploading a package to pypi by @polinabinder1 in #638
- add timing callback by @sichu2023 in #657
- add options for pytest duration logging by @pstjohn in #656
- Bump 3rdparty/NeMo from
0cd990dto6d90758by @dependabot in #660 - Bump 3rdparty/Megatron-LM from
4fb4c3dto0e85db5by @dependabot in #661 - add back transformer engine install by @pstjohn in #658
- Update MoCo Version and MDLM params by @nvdreidenbach in #632
- Fix nightly container link in README by @pstjohn in #666
- Changes to SCDL and documentation by @polinabinder1 in #643
- update issue templates by @dorotat-nv in #668
- improve readme by @yzhang123 in #665
- Dependency graph by @polinabinder1 in #659
- Bump 3rdparty/NeMo from
6d90758to48f10afby @dependabot in #676 - Support NVFlare sequence-level classification fine-tuning by @farhadrgh in #664
- Bump 3rdparty/Megatron-LM from
0e85db5to2a9793dby @dependabot in #675 - Update pytorch base image by @pstjohn in #670
- clean up distributed env setup and support multi-device testing by @sichu2023 in #535
- support arbitrary metric logging from torchmetrics by @sichu2023 in #677
- Add scheduled nightly tests on github CI by @pstjohn in #687
Full Changelog: v2.3...v2.4
NVIDIA BioNeMo Framework v2.3
New Features
- Distributed Inference Support for ESM2 and Geneformer
- Enables linear inference throughput as GPU number is increased
- See ESM2 inference notebook and use
--num-gpusparameter.
Updates & Improvements
- Prior Geneformer inference on H100 accuracy regression fixed.
- Base image updated to
nvcr.io/nvidia/pytorch:24.12-py3; python updated to 3.12 among other core dependency upgrades (base container release notes here).
Changes
- Distributed Inference Support for ESM2/Geneformer by @farhadrgh in #482
- Flexible memory management to avoid fragmentation-related CUDA OOM by @farhadrgh in #524
- Update nightly Docker image tag by @tshimko-nv in #539
- set UV_NO_CACHE by @pstjohn in #529
- RowFeatureIndex Optimization by @polinabinder1 in #531
- Updates to NvFaidx, Fasta Noodles, and Sequence Accessor by @skothenhill-nv in #532
- Fix csv dataset by @holgerroth in #543
- Run all pytests even if submodules fail by @pstjohn in #545
- xFail known bad tests on H100 and fix CVEs by @gagank1 in #549
- Fully Integrate SCDL into Geneformer by @savitha-eng in #480
- Fix MLM loss ignore idx by @farhadrgh in #552
- Attempts to bump the base image to pytorch:24.07 by @pstjohn in #544
- Pstjohn/update base image 2410 by @pstjohn in #551
- [BUFIX] fail when passed fastas with duplicate sequence ids by @skothenhill-nv in #555
- Update ddp config to improve ESM-2 15B MFU by @sichu2023 in #520
- add temporary mistune pin to fix docs build issue by @pstjohn in #559
- Bump 3rdparty/Megatron-LM from
99f23d2to2da43efby @dependabot in #558 - Bump 3rdparty/NeMo from
06e6703to06a1491by @dependabot in #538 - update base image to 24.12 by @pstjohn in #553
- Un-xfail geneformer on H100 test by @trvachov in #563
- update devcontainer for new ubuntu base image by @pstjohn in #566
- don't eagerly download esm2 checkpoints by @pstjohn in #567
- run pytest with or without docs and notebooks in run_pytest.sh by @dorotat-nv in #569
- Jwilber/bionemo example small updates by @jwilber in #561
- remove unused file from repo by @jwilber in #562
- add initial configs for perf testing on ESM2 in JET (bionemo2) by @dorotat-nv in #497
- Add pre-training page for ESM-2 by @pstjohn in #578
- Edits to README and CONTRIBUTING.md, moving some text around by @pstjohn in #577
- Refactor dockerfile for better caching and avoid pbss download in notebook test by @pstjohn in #573
- Bump 3rdparty/NeMo from
06a1491tod44ed44by @dependabot in #580 - Simplify ESM2 finetune test by @farhadrgh in #576
- default to overlap_param_gather by @sichu2023 in #582
- Bump 3rdparty/Megatron-LM from
2da43efto65720c8by @dependabot in #579 - Add self-hosted azure runner workflows by @pstjohn in #587
- ARM docker build with 24.12 pytorch fw image by @trvachov in #581
- Add gpu target identificator to JET configs by @dorotat-nv in #586
- add codecov badge by @pstjohn in #588
- Add support for marking and skipping slow tests, temporarily mark pydantic tests as slow by @pstjohn in #589
- pin cdifflib version by @pstjohn in #593
- Remove outdated note on very large datasets in MultiEpochDataset by @pstjohn in #521
- Bump 3rdparty/NeMo from
eb9848btoabd4bf7by @dependabot in #597 - Revert "pin cdifflib version (#593)" by @pstjohn in #599
- fix esm2_pretrain.yaml by @dorotat-nv in #600
- add myself to ci by @nvdreidenbach in #594
- Bump virtualenv from 20.26.3 to 20.26.6 by @dependabot in #596
- Bump 3rdparty/Megatron-LM from
65720c8toc76410aby @dependabot in #592 - move load calls, rename test for better readibility by @pstjohn in #601
- Only run cleanup if tests ran, adds pytest marker config for slow tests by @pstjohn in #595
- only run trufflehog on diff by @pstjohn in #604
- run trufflehog on entire main branch on push action by @pstjohn in #605
- add comments to the unit-test.yaml file by @pstjohn in #606
- Remove v2.0 from README title by @pstjohn in #602
- ESM2 Finetuning refactor by @farhadrgh in #574
- fix image links in esm2 model card by @pstjohn in #584
- Release of v1.0 of BioNeMo Modular Co-Design (MoCo) by @nvdreidenbach in #575
- fix devcontainer paths in ubuntu 24 by @pstjohn in #610
- Bump rsync and other dockerfile lints by @pstjohn in #603
- Jm/codeowners revamp by @jomitchellnv in #617
- Update MoCo notebooks by @nvdreidenbach in #614
- set min seq len by default by @pstjohn in #621
- hotfix for some failing python tests due to NGC files being moved around by @pstjohn in #626
- Bump 3rdparty/Megatron-LM from
c76410ato4fb4c3dby @dependabot in #624 - revert ESM2 Finetuning refactor (#574) by @farhadrgh in #628
New Contributors
- @holgerroth made their first contribution in #543
- @nvdreidenbach made their first contribution in #594
Full Changelog: v2.2...v2.3
NVIDIA BioNeMo Framework v2.2
New Features
- Small Molecule Featurization
- Implemented elementary and advanced atom, bond, and full molecule featurizers.
- GH200 Support for BioNeMo
- Added a
Dockerfile.armthat builds a BioNeMo container that runs on GH200 machines. - Publish a version of the BioNeMo container that supports multiple architectures to NGC.
- Added a
Updates & Improvements
- Single-Cell Dataloader (SCDL)
- Changed metadata storage to
parquetfiles, which creates a 30x speed up when iterating over a large dataset. - Added functionality to concatenate several
anndatafiles without doubling disk memory usage.
- Changed metadata storage to
- ESM2
- Added support for
SIGTERMpreemption checkpoint saving. - Moved ESM-2 and Geneformer training scripts to new executables,
train_esm2andtrain_geneformer, respectively. - Moved inference script to a new executable
infer_esm2, and deprecated the inference example in the fine-tuning tutorial. - Added new Jupyter notebook tutorials for inference and zero-shot protein design. These notebooks can be deployed on the cloud resources as a brev.dev launchable.
- Added support for
Known Issues
- Loading a checkpoint for Geneformer inference on H100 has a known regression in accuracy. Work is in progress to resolve by next release.
Changes
- Move ESM2 scripts to sub-packages by @farhadrgh in #406
- WAR: sets checkpoint filename to be more unique by @skothenhill-nv in #429
- Update NeMo and Megatron to TOT by @pstjohn in #424
- re-enable merge groups to trigger blossom-ci by @pstjohn in #431
- Revert "re-enable merge groups to trigger blossom-ci" by @pstjohn in #434
- Updated notebook, and nemo2 checkpoint with geneformer by @jstjohn in #430
- add pre-emption callback to esm2 train by @pstjohn in #433
- add rdkit dependency to bionemo-geometric by @sveccham in #432
- eliminate the need for NGC login - bionemo2 by @dorotat-nv in #440
- Add documentation and release info to README by @sirelkhatim in #447
- Bump 3rdparty/Megatron-LM from
aded519to5438d15by @dependabot in #444 - Launchable notebooks in docs! by @jstjohn in #451
- Cache dev build from our nightly public container by @jstjohn in #462
- set num_workers to 1 for esm2 tests by @pstjohn in #461
- ESM2 Tutorial Updates by @farhadrgh in #426
- BugFix: fix bugs on bionemo-size-aware-batching by @guoqing-zhou in #449
- Fix typos in geneformer benchmark description by @jstjohn in #470
- Pillow version bump into main by @polinabinder1 in #465
- Refactor SCDL Row Feature Index for Performance Improvement (Rebased) by @savitha-eng in #466
- pin correct tornado requirement by @polinabinder1 in #474
- Updating Brev.Dev documentation by @polinabinder1 in #483
- Add release notes for v2.1 by @tshimko-nv in #468
- Update VERSION by @polinabinder1 in #488
- Atom and bond features by @sveccham in #453
- Molecule featurizer and molecule graph by @sveccham in #484
- hillst/bionemo noodles by @skothenhill-nv in #458
- update collate mask_value by @pstjohn in #485
- override checkpoint precision by @farhadrgh in #475
- JSON -> YAML for CLI by @skothenhill-nv in #436
- [QA Bug] Remove NGC dependency by @farhadrgh in #494
- Bump 3rdparty/NeMo from
e2b0f0eto06e6703by @dependabot in #486 - Bump 3rdparty/Megatron-LM from
5438d15to844119fby @dependabot in #496 - change source for coverage report by @pstjohn in #495
- Pstjohn/stop and go test non validation by @pstjohn in #476
- Add support on num steps for learning rate scheduler by @sichu2023 in #489
- Initial compatibility testing images by @malcolmgreaves in #438
- Conda-Based Compatibility Test Images by @malcolmgreaves in #507
- Instructions on compatibility image build by @malcolmgreaves in #512
- Formatting by @malcolmgreaves in #513
- Pstjohn/fix ci by @pstjohn in #515
- [FEA][webdatamodule]: support webdataset invocable by @DejunL in #501
- GH200 support by @gagank1 in #369
- Remove quotes for Jupyter command on startup in init guide by @tshimko-nv in #523
- Reduce esm2 and geneformer test burden by @sichu2023 in #499
- [v2.2] Publish release notes for BioNeMo FW v2.2. by @cspades in #522
- Disable validation/test stages in ESM-2 and Geneformer by @sichu2023 in #492
- CI HOTFIX: ignore inrun_pytest.sh a notebook by @dorotat-nv in #526
- added NeMoLogger unit tests by @dorotat-nv in #511
- Bump 3rdparty/Megatron-LM from
844119fto99f23d2by @dependabot in #528 - [cye/wandb-fix] Fix WandB issue. by @cspades in #530
- xFail known bad tests on H100 and fix CVEs by @gagank1 in #547
New Contributors
- @sveccham made their first contribution in #432
- @sirelkhatim made their first contribution in #447
Full Changelog: v2.1...v2.2