-
Notifications
You must be signed in to change notification settings - Fork 76
Description
pysam behavior (with the current version in GATK-SV, 0.15.4) with respect to END coordinates differs with breakend alt allele representation compared to symbolic alt alleles.
When breakend alt alleles are used, if the stop coordinate is less than the start coordinate, pysam will immediately overwrite it with the start coordinate (appears to be immediately upon reading in the record, or perhaps the first time the stop coordinate is accessed) and remove the END field in INFO. This means that records with breakend alt alleles that have END<POS are not properly updated by running src/sv-pipeline/scripts/format_svtk_vcf_for_gatk.py with --fix-end, as record.stop is overwritten by the start coordinate, so END2 is set to the start coordinate, and END is dropped from INFO.
Conversely, when symbolic alt alleles are used, pysam allows record.stop to be less than record.pos.
This is not expected to cause issues with the current version of GATK-SV, which primarily uses symbolic alt alleles, and which sets END and END2 during VCF standardization in GatherBatchEvidence. However, we should be aware of this behavior when working with legacy data.