Fix chem-msms-predict with public MassSpecGym checkpoints (instrument arg + ms_pred patches)#38
Conversation
…nstalled ms_pred / CPU / PL 2.x
- predict_msms.py: add --instrument (default Orbitrap). The public MassSpecGym
checkpoints condition on an instrument (Orbitrap|QTOF); it previously defaulted
to None -> serialized to NaN -> int64 overflow crash in the gen model.
- install.sh: patch upstream ms_pred after clone for 3 bugs that break ICEBERG
forward prediction when ms_pred is pip-installed / run on CPU / with PL 2.x:
* iceberg_elucidation.py invoked predict_smis.py via a cwd-relative path
* predict_smis.py used pl.utilities.seed.seed_everything (removed in PL 2.0)
* predict_smis.py called torch.cuda.set_device(gpu_id) unconditionally (CPU-fail)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mlederbauer
left a comment
There was a problem hiding this comment.
Hi! Creator of this skill here. Thank you very much for this addition that enables a user to run ICEBERG on CPU, and the missing instrument flag. I am kindly asking to merge PR hugogontijomachado#1 into your feature branch with some additional edits, after which this branch id ready to merge into main.
conda run -n ms-gen python .agents/skills/chem-msms-predict/scripts/predict_msms.py \
--smiles "c1ccccc1C(=O)OCCN" \
--gen_ckpt downloads/iceberg_dag_gen_msg_best.ckpt \
--inten_ckpt downloads/iceberg_dag_inten_msg_best.ckpt \
--collision_energies 20 40 \
--adduct "[M+H]+" \
--instrument "Orbitrap" \
--output_dir results/msms_prediction
Thank you very much! cc @bowen-bd
|
Thanks @hugogontijomachado for the contribution and @mlederbauer for the review! |
Fix/iceberg public checkpoints
…ally runs
Verified by rebuilding the env from install.sh on macOS arm64: the build succeeds,
but ICEBERG forward prediction crashes at runtime with unpinned `ray`:
- `ray` (no `[tune]` extra) is missing pyarrow -> ModuleNotFoundError: pyarrow
- it resolves ray>=2.8, which removed `ray.tune.integration.pytorch_lightning`
that ms_pred imports eagerly -> "Can't import ray.tune"
Pinning ray[tune]==2.7.2 brings the tune extra + the integration module; ray 2.7.2
imports pkg_resources, removed in setuptools 81, so also pin setuptools<81.
After this, predict_msms.py runs end-to-end (glyphosate [M-H]- -> 168.0067, spectrum).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Thanks @mlederbauer for the review and @bowen-bd for taking a look! @bowen-bd — answering your question: @mlederbauer's PR (hugogontijomachado#1) is now merged into this branch. ✅ Before merging I rebuilt the env from the updated The rebuild surfaced one runtime issue: with Branch should be ready to merge into main — thanks both! |
Summary
Make
chem-msms-predict(ICEBERG forward prediction) actually run with the publicMassSpecGym checkpoints on a standard install. Two changes:
1.
predict_msms.py: add--instrument(defaultOrbitrap)The public MassSpecGym ICEBERG checkpoints condition on an instrument feature
(
Orbitrap|QTOF). The skill never passed one, soinstrumentdefaulted toNone,was serialized to the candidates TSV as an empty cell, read back as
NaN, and crashed thegenerator with
RuntimeError: value cannot be converted to type int64 without overflow.Added an
--instrumentargument (defaultOrbitrap) threaded intoiceberg_prediction.2.
conda-envs/msms-agent/install.sh: patch 3 upstreamms_predbugs after cloneWith
ms_predpip-installed and run on CPU with pytorch_lightning ≥ 2.0, ICEBERG forwardprediction hits three upstream bugs (none macOS-specific):
iceberg_elucidation.pyinvokespredict_smis.pyvia a cwd-relative path (onlyresolves from a repo checkout).
predict_smis.pyusespl.utilities.seed.seed_everything(removed in PL 2.0).predict_smis.pycallstorch.cuda.set_device(gpu_id)unconditionally (fails on CPU).A post-clone patch step applies the three fixes so the env is usable out of the box.
Testing
Verified end-to-end on macOS arm64 (CPU): glyphosate MS/MS for
[M-H]-(precursor168.0067) and
[M+H]+(170.0213) at CE 10/20/40/60 eV produce sensible spectra(
168→124,170→88,170→124, water losses).