Skip to content

pysam behavior with breakend alleles #927

@epiercehoffman

Description

@epiercehoffman

pysam behavior (with the current version in GATK-SV, 0.15.4) with respect to END coordinates differs with breakend alt allele representation compared to symbolic alt alleles.

When breakend alt alleles are used, if the stop coordinate is less than the start coordinate, pysam will immediately overwrite it with the start coordinate (appears to be immediately upon reading in the record, or perhaps the first time the stop coordinate is accessed) and remove the END field in INFO. This means that records with breakend alt alleles that have END<POS are not properly updated by running src/sv-pipeline/scripts/format_svtk_vcf_for_gatk.py with --fix-end, as record.stop is overwritten by the start coordinate, so END2 is set to the start coordinate, and END is dropped from INFO.

Conversely, when symbolic alt alleles are used, pysam allows record.stop to be less than record.pos.

This is not expected to cause issues with the current version of GATK-SV, which primarily uses symbolic alt alleles, and which sets END and END2 during VCF standardization in GatherBatchEvidence. However, we should be aware of this behavior when working with legacy data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions