Skip to content

Add allele chunking and pan-HLA prediction support#341

Draft
jonasscheid wants to merge 2 commits intonf-core:devfrom
jonasscheid:feature/allele-chunking
Draft

Add allele chunking and pan-HLA prediction support#341
jonasscheid wants to merge 2 commits intonf-core:devfrom
jonasscheid:feature/allele-chunking

Conversation

@jonasscheid
Copy link
Collaborator

Summary

Closes #340

  • Implements allele chunking in prepare_prediction_input using the existing MaxNumberOfAlleles enum (previously marked # TODO: Implement)
  • Tools with allele limits (NetMHCpan/NetMHCIIpan: 50) automatically split into parallel chunks
  • Supports all in the samplesheet alleles column to trigger pan-HLA mode (predict against all supported alleles per tool)
  • Adds --max_alleles_per_chunk parameter to override per-tool defaults
  • Passes per-file allele lists through to merge_predictions for correct NetMHC allele-index resolution across chunks

Changes

  • prepare_prediction_input.py: Allele chunking logic, all sentinel handling, per-chunk file + JSON output
  • mhc_binding_prediction/main.nf: Chunk-aware branch routing, unique per-chunk file_id for predictor output, per-file alleles carried to merge
  • merge_predictions.py: Uses per-file allele lists instead of global meta.alleles (also fixes a latent bug where unsupported alleles could cause wrong index mapping)
  • merge_predictions/main.nf: Added val(alleles_per_file) input
  • Config: New max_alleles_per_chunk param in nextflow.config, nextflow_schema.json, modules.config

Usage

sample,alleles,mhc_class,filename
sample1,all,II,peptides.tsv

Test plan

  • Run tests/peptides.nf.test (backward compat, default params = no chunking)
  • Run tests/mhcflurry.nf.test (backward compat)
  • Manual test with --max_alleles_per_chunk 1 to force chunking with mhcnuggets
  • Manual test with alleles=all in samplesheet

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
if not alleles_str:
continue
tool_default = MaxNumberOfAlleles[tool.upper()].value
max_alleles = global_max if global_max > 0 else tool_default
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
max_alleles = global_max if global_max > 0 else tool_default
max_alleles = global_max if global_max > 0 else MaxNumberOfAlleles[tool.upper()].value

for tool, alleles_str in tools_allele_input.items():
if not alleles_str:
continue
tool_default = MaxNumberOfAlleles[tool.upper()].value
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tool_default = MaxNumberOfAlleles[tool.upper()].value

stub:
def prefix = task.ext.prefix ?: "${meta.id}"
"""
echo '{"mhcflurry":"","mhcnuggets":"","mhcnuggetsii":"","netmhcpan":"","netmhciipan":""}' > ${prefix}_allele_input.json
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed?

.map { meta, file -> [meta.findAll { k, _v -> k != 'alleles_supported' }, file] } // drop alleles_supported from meta
.groupTuple()
.join( ch_peptides_to_predict )
.map { meta, file ->
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this looks not super robust

netmhciipan: (file.name.contains('netmhciipan_input') && allele_input_dict['netmhciipan'])
return [meta + [alleles_supported: allele_input_dict['netmhciipan']], file]
// Find the JSON key matching this file (supports both "tool" and "tool_chunkN" keys)
def key = allele_input_dict.keySet().find { k -> file.name.contains("${k}_input") }
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very error prone, is there a robust desgin that we can use here?

…atching

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant