Skip to content

esvee crashing with negative indices from low complexity regions #779

@mulderdt

Description

@mulderdt

Hi, we are running the Purple workflow.

You can see the parameters used to run it here: #778

Unfortunately for around half of our paired tumour/normal samples the workflow crashes during the Esvee v1.1.2 step. What is happening is that in these samples there is a single low complexity region read trying to be joined with 1 or more right reads from other low complexity regions.

[dmulder@n105 1b75f0fc9a36870892ad816a6a1a8f]$ cat esvee.log 
19:51:48.292 [INFO ] Esvee version 1.1.2
19:51:48.295 [INFO ] output(esvee/) 
19:51:48.313 [INFO ] loaded 461 known hotspot records from file
INFO	2026-01-15 19:51:48	Defaults	Found file for property samjdk.reference_fasta: /projects/trans_scratch/purple/POG2287/F165045_F162454/purple-v3.2.3/ut_1/work/a3/1b75f0fc9a36870892ad816a6a1a8f/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna 
19:51:50.091 [INFO ] running Esvee Prep for F165045,F162454
19:51:50.092 [INFO ] calculating fragment size distribution
19:51:52.662 [INFO ] maxReadLength(150) 
19:51:52.686 [INFO ] fragment length distribution bounds(min=156 max=488)
19:51:52.688 [INFO ] processing chromosome(chr1)
19:53:21.953 [INFO ] chromosome(chr1) 249 regions complete, stats: reads(282789323) junc(15201) juncFrags(146354) supportFrags(init=6852681 final=499796) groups(comp=439299 incomp=3947121 span=2466179)
...
20:14:03.961 [INFO ] processing chromosome(chrY)
20:15:09.336 [INFO ] chromosome(chrY) 58 regions complete, stats: reads(23980004) junc(9153) juncFrags(142822) supportFrags(init=2367892 final=786872) groups(comp=121469 incomp=1090667 span=1155924)
20:15:09.352 [INFO ] assigning candidate mate read to junctions
20:15:31.408 [INFO ] candidate reads assignment complete
20:15:32.105 [INFO ] 19059255 records written to BAM
20:16:22.104 [INFO ] Esvee prep complete, mins(24.534)
20:16:22.338 [INFO ] writing to output directory(esvee/)
20:16:22.339 [INFO ] fragment length bounds(min=156 max=488)
20:16:22.994 [INFO ] loaded 234841 junctions, types(split=227932 discordant=6909 indel=26121 hotspot=0) from file: esvee/F165045.esvee.prep.junction.tsv
20:17:05.424 [INFO ] created 100108 junction assemblies from reads(junc=3361516 candidate=5354980)
20:17:05.424 [INFO ] extracted read stats: reads(15856542) lowBaseQual(0) trim(polyG=97945 lowBase=1016064) indelSoftClip(3366481) decoySequences(0) refBaseAlignFails(56592)
20:17:06.167 [INFO ] building remote phase groups, current group count(25348)
20:17:07.029 [INFO ] phase group building complete, final group count(22710)
20:17:07.063 [INFO ] building phase sets from 40251 phase groups
20:17:12.591 [ERROR] failed building phase group(id(4734) assemblies(329)) sets
... (lines removed)
20:17:12.603 [INFO ] assembly: junc(chr17:26962689:-1) coords(extLen=96 refLen=118 refBasePos=26962806 len=214 juncIndex=96) support(111) mismatches(0)
java.lang.StringIndexOutOfBoundsException: begin -47, end 53, length 53
	at java.base/java.lang.String.checkBoundsBeginEnd(String.java:4606)
	at java.base/java.lang.String.substring(String.java:2709)
	at com.hartwig.hmftools.esvee.assembly.types.JunctionSequence.matchSequence(JunctionSequence.java:235)
	at com.hartwig.hmftools.esvee.assembly.phase.AssemblyLinker.tryAssemblyOverlap(AssemblyLinker.java:282)
	at com.hartwig.hmftools.esvee.assembly.phase.PhaseSetBuilder.checkSplitLink(PhaseSetBuilder.java:730)
	at com.hartwig.hmftools.esvee.assembly.phase.PhaseSetBuilder.findSplitLinkCandidates(PhaseSetBuilder.java:328)
	at com.hartwig.hmftools.esvee.assembly.phase.PhaseSetBuilder.applyOtherLinksAndExtensions(PhaseSetBuilder.java:462)
	at com.hartwig.hmftools.esvee.assembly.phase.PhaseSetBuilder.findOtherLinksAndExtensions(PhaseSetBuilder.java:449)
	at com.hartwig.hmftools.esvee.assembly.phase.PhaseSetBuilder.buildPhaseSets(PhaseSetBuilder.java:155)
	at com.hartwig.hmftools.esvee.assembly.phase.PhaseSetBuilder.run(PhaseSetBuilder.java:141)
	at com.hartwig.hmftools.esvee.assembly.phase.PhaseSetTask.run(PhaseSetTask.java:82) 

I introduced some logging and protection (we warn instead of crashing when there is an illegal index access attempt) and rebuilt the jar for v1.1.2 and it completed successfully.

15:55:21.711 [INFO ] building phase sets from 40254 phase groups
15:55:37.960 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.960 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr16:33028033:-1, secondPos=33028033, firstSeqLen=53, secondSeqLen=158, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=AAGCCAGGTCGGCAAAAAGCCGCGGTGATGGGGGCAAAAAGCCGCGGCGGCAGGGGCAAAAAACCACAAAAAGCCGCAGCGGCGGGCGCAAAAAGCTGCAACGGTGGGGGCAAAAAGCTGGGGCGGTGGGGGAAAAAGCCGGGGCGACGGGGGCAAAA
15:55:37.960 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.960 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr16:33028033:-1, secondPos=33028033, firstSeqLen=53, secondSeqLen=174, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=TCGACAAANAGCCGTGGCGGCGGGGAAAAAGCCGCGGTGGTGGGGGCAAAAAGCCGCGGCGGCGGGGGCAAAAAACCACAAAAAGCCGCGGCGGCCGGCGCAAAAAGCCGCAACGGTGGGGGCAAAAATCCGGGGCGGTGGGGGAAAAAGCCGGGGCGACGGGGGCAAAAAGCT
15:55:37.961 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.961 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr16:33028067:-1, secondPos=33028067, firstSeqLen=53, secondSeqLen=204, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=CAGCGGTAAAAACCTGTGGGGGCAGGAGCAAAAAGCCGCGGCACGGGGAAAAATCCGCGGGGGGCAAAAAGCCACGGCGGCGGGTGCAAAATGCCGCAACGGTGGGGTCAAAAAGCCTGGGCGGTGGGGGAAAAAGCCGGGGCGACGGGGGCAACAAGGCACGGCGGCGGGGGCAACAAGCCATGGCGGCCGAGGCAAACAACC
15:55:37.961 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.961 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr16:33028156:1, secondPos=33028156, firstSeqLen=53, secondSeqLen=203, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=TTTTTGCCCCCACCGTCGCGGCTTATTGACCACGACGCTGCGGCTTTTTACGGCTTTTTGCCCCCGTCACCGCAGCTTTTTGCCCCCGTCGCCCCGGCTTTTTCCCCCACCGCCCCAGCTTTTTGCCCCCACCGTTGCAGCTTTTTGCGCCCGCCGCTGCGGCTTTTTGTGGTTTTTTGCCCCCGCCGCCGCGGCTTTTTGCC
15:55:37.961 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.961 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr1:125119339:1, secondPos=125119339, firstSeqLen=53, secondSeqLen=150, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=CTGCAGCGGCCGGAGTAAAAAGCCAGGTCGGCAAAAAGCCGCGGTGATGGGGGCAAAAAGCCGCGGCGGCAGGGGCAAAAAACCACAAAAAGCCGCGGCGGCGGGCGCAAAAAGCTGCAACGGTGGGGGCAAAAAGCTGGGGCGGTGGGG
15:55:37.961 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.962 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr7:60956751:1, secondPos=60956751, firstSeqLen=53, secondSeqLen=141, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=CCTGCAGCGGCCGGAGTAAAAAGCCAGGTCGGCAAAAAGCCGCGGTGATGGGGGCAAAAAGCCGCGGCGGCAGGGGCAAAAAACCACAAAAAGCCGCGGCGGCGGGCGCAAAAAGCTGCAACGGTGGGGGCAAAAAGCTGG
15:55:37.962 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.962 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr7:60956833:1, secondPos=60956833, firstSeqLen=53, secondSeqLen=191, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=CGATCTCGATCCCGCCCTCAGACCCGCGGCAGTGGGGGCAAAAAGCTGCGACAGCGGGGTGAAAAAGCCGCAGCGGTAAAAACCTGCAGCGGCCGGAGTAAAAAGCCAGGTCGGCAAAAAGCCGCGGTGATGGGGGCAAAAAGCCGCGGCGGCGGGGGCAAAAAACCACAAAAAGCCGCAGCGGCGGGCGC
15:55:37.962 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.962 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr10:42149930:-1, secondPos=42149930, firstSeqLen=53, secondSeqLen=143, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=GCCGATCCCGCCCTCAGACCCGCGGCGGTGGGGGCAAAAAGCCGCGGTGATGGGGGCAACAAGCCGCGGTGGCGGGGGCAAAAGCCGCGGCGGCGGAGGCAAAAAGCCGTAAAAAGCCGCAGCGTCGGGGGCAAAAAGCCGCG
15:55:37.962 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.962 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr10:42149950:-1, secondPos=42149950, firstSeqLen=53, secondSeqLen=139, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=GCAACAAGCCATGGCGGCCGAGGCAAACAGCTGCGGCGACAAAAAGCTGCGGTGGGGGGAGCAAAAAGCCATGGCGGCGGGGGCAAAAAGCTGTGGTGACGGGGGCGAAAAGCCGTAAAAAGCCACAACATCGGGGGCA
15:55:37.962 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.962 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr10:42149951:-1, secondPos=42149951, firstSeqLen=53, secondSeqLen=158, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=GCAACGGGGGCAACAAGCGGCGGCGGCAGGGGCAACAAGCCGCGGCGGCCGGGGCAAACAGCCGCGGTGACAAAAAGCTGCGGCGGCCGGGGCAAAAAGCTGCGGTGACGGGAGCAAAAAGCTGTAAAAAGCCACAGCGTCGGGGGCAAAAAGCCGCG
15:55:37.962 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.962 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr10:42150022:-1, secondPos=42150022, firstSeqLen=53, secondSeqLen=184, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=AGCCGTAAAAAGCCGCAGCGTCGTGGTCAATAAGCCGCGACGGTGGGGGCAAAAAGCCGCGTTGGCGGGGGTAAGAAGCCGCGGCGGCAAAAAGATGCGGCGGCGCGGCGGCGGGGGCAAAGAGCAGGGGCGGCAAAAAGCCGCGGCGGTAGCGGGGGCCAAAAGCCGCGGCAGGAAAAACCTG
15:55:37.963 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.963 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr10:42149833:-1, secondPos=42149833, firstSeqLen=53, secondSeqLen=183, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=AACCTGCGGCGGCGGGAGTAAAAAGCCGCGTCGACAAAAAGCCGTGGCGGCGGGGAAAAAGCCGCGGCGGCGGGGGCAAAAAACCACAAAAAGCCGCGGCGGCGGGCGCAAAAAGCCGCAACGGTGGGGGCAAAAATCCGGGGCGGTGGGGGAAAAAGCCGGGGCGACGGGGGCAAAAAGCTG
15:55:37.963 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.963 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr2:90360358:-1, secondPos=90360358, firstSeqLen=53, secondSeqLen=152, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=CGCCCTCAGACCCGCGGCGGTGGGGGCAAAAAGCCGCGGTGATGGGGGCAACAAGCCGCGGTGGCGGGGGCAAAAGCCGCGGCGGCGGAGGCAAAAAGCCGTAAAAAGCCGCAGCGTCGGGGGCAAAAAGCCGCGACGGCGGGGGCAAAAAG
15:55:37.963 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.963 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr2:90360419:1, secondPos=90360419, firstSeqLen=53, secondSeqLen=151, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=ACCGGGGCTTTTTTCCCCCACCGCCGCAGGTTTTTCCCGCCCCTGCTTTTTGCCCCCGCCACCGCGGCTTCTTACCCCCGCCACCGCGACTTTTTGCCCCCGCCGTCGCGGCTTTTTGCCCCCGACGCTGCGGCTTTTTACGGCTTTTTGC
15:55:37.963 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.963 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr17:26731431:-1, secondPos=26731431, firstSeqLen=53, secondSeqLen=180, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=CACCGTTGCGGCTTTTTGCGCCCGCCGCCGCGGCTTTTTGTGGTTTTTTGCCCCCGCCGCCGCGGCTTTTTCCCCGCCGCCACGGCTTTTTGTCGACGCGGCTTTTTACTCCCGCCGCCGCAGGTTTTTACCGCTGCGGCTTTTTAACCCCGCCGTCGCAGCTTTTTGCCCCTACCGCCG
15:55:37.963 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.964 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr17:26731468:-1, secondPos=26731468, firstSeqLen=53, secondSeqLen=217, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=AGGCTTTTTGACCCCACCGTTGCGGCATTTTGCACCCGCCGCCGTGGCTTTTTGCCCCCCGCGGATTTTTCCCCGTGCCGCGGCTTTTTGCTCCTGCCCCCACAGGTTTTTACCGCTGCGGCTTTTTCTCCCCGCCGTTGCGGGTTTTTCCCCCCACGACCGCGGCTTTTTGCCCCCACCGCCGCGGGTCTGAGGGCGGGATCGGCAAACTCGGCTG
15:55:37.964 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.964 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr17:26731655:-1, secondPos=26731655, firstSeqLen=53, secondSeqLen=557, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=CTCTACCGGCGTCCTGGCTAAGGCAGCGCCGAGGGGCGCTCCTTGTCCAGCTCTCCTGGTTCGGGCGTTCCTTGCCTAGACGCTGGCGCCCCGGGCTCTGCTCCTGGGCCGCTGCAGCCTGCATAGAGCAGCGCTGCGCTCGGCCGCGTTGGGAGAGAAGAAGGAGGGCGGTGGCGGGGGTGACGCGGCTATCGCGGAGGGAGGCTCACGGGCCGCGGCCAGCCAGGTGCTGCAGCAGTGCGGGCAGCTCCAGAAGCTCATCAGCATCTCTGTTGGCAGCCTGCGCGGGCTGCGCACCATGTGCGCTGTGTCCAAGGACCTCACCCAGCAGGAGATACGGACCATGGAGGTAAGGGGGTCAGGGACAAGGGCTGGGCTCCCGCACCGGACTGGACATCTCCCTCGGGGCCCCAGTTCACTCCTGGCCGAGTTGCGTCCTTGAGCCCGCGTCGCCTCCCTGGAGGCTTCTCCTCCATCCTGCACTCGCTGATGCGGCAGCCAGAGGACCCGGGACCAGCCCTCACCTTGGGCAGGATTTGTGGGGCGGGTGCGTGTTG
15:55:37.964 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.964 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr17:26732212:1, secondPos=26732212, firstSeqLen=53, secondSeqLen=173, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=ATGGAAAATCGGGGGGTAAGGGGATGTCCGCGCGCAGCCCACCCCGCCCACGGGACCCTCGAGCCTCCATCACAGTTCCCAACACGCACCCGCCCCACAAATCCTGCCCAAGGTGAGGGCTGGTCCCGGGTCCTCTGGCTGCCGCATCAGCGAGTGCAGGAGGGAGGAGAAGC
15:55:37.964 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.964 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr2:91442426:1, secondPos=91442426, firstSeqLen=53, secondSeqLen=234, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=GCAAAAAGCCACAAAAAGCCGGGGCGGCGGGGTAAGAAGCCGCGGCGGCGTGGCAAGAGGCCGGCGGCGGGGCAAGAGGCCGCGGCGGCGGGGCAAGAGGCCGCGGCGGCGGGGCAAGAGGCCGCGGCGGGAAAAACCTGCGGCGGCGGGGGCGAAAAGCAGTAAAAAGCCGCGGCGCCGGGGGCCAAAAGCCATAAAAAGTCGCGGTGGCGGTGACAAAAAGCCGCTGCGGAA
15:55:37.964 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.964 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr2:91442426:1, secondPos=91442426, firstSeqLen=53, secondSeqLen=226, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=GGGTAAGAAGCCGCGGCGGCGTGGCAAGAGGCCGGCGGCGGGGCAAGAGGCCGCGGCGGCGGGGCAAGAGGCCGCGGCGGCGGGGCAAGAGGCCGCGGCGGCGGGGCAAGAGGCCGCGGCGGGAAAAACCTGCGGCGGCGGGGGCGAAAAGCAGTAAAAAGCCGCGGCGCCGGGGGCCAAAAGCCATAAAAAGTCGCGGTGGCGGTGACAAAAAGCCGCTGCGGAA
15:55:37.964 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.964 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr2:91442456:1, secondPos=91442456, firstSeqLen=53, secondSeqLen=231, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=AGGAAGCCGCGCAGGGGGCAAGGAGCCACGGCGGCGGGGGCAAAACGCTGCTGTGGCGGGGCAAATAGAAGCAAAAAGCCGCGGCGGCGTGGGCAAAAAGCTGCAAAAAGCCCGGGCGGCGGGGCAAGAAGCCGCGGCGGGAAAAACCTGCGGCGGCACGGGCGAAAAGCAGTAAAAAACCGCGGCGCCGGGGGCCAAAAGCCATAAAAAGCCTCGGCGGCGGTGGCAAAA
15:55:37.965 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.965 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr16:34014974:1, secondPos=34014974, firstSeqLen=53, secondSeqLen=102, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=AAAAAGCCGCGGCAGCGGGGAAAAAGCCGCTGTGATGGGGGAAAGAAGCCGCGGCAGCGGGGGCAAAAAGCCACAAAAAGCCGCTGCGGCGGGCGCAAAAAG
15:56:03.515 [INFO ] Assembly completed with 22 invalid matchSequence events
16:12:56.321 [INFO ] created 34304 phase sets, remote reads extracted(106508)

details of my changes:

  1. Modify the JunctionSequence.java file to protect the call from failing when given invalid values for start/end and to log such incidents.

 

/projects/dmulder_prj/scratch/purple/esvee-v1.1.2/hmftools-esvee-v1.1.2/esvee/src/main/java/com/hartwig/hmftools/esvee/assembly/types/JunctionSequence.java

[dmulder@gphost10 types]$ diff JunctionSequence.java JunctionSequence.java.backup
12,15d11
< //for debugging purple issue
< import com.hartwig.hmftools.esvee.assembly.AssemblyStats;
< import static com.hartwig.hmftools.esvee.assembly.AssemblyConfig.SV_LOGGER;
< 
237,258c233,235
< // original
< //    public final String matchSequence()
< //    {
< //        return FullSequence.substring(mMatchSeqIndexStart, mMatchSeqIndexEnd + 1);
< //    }
< 
<     //with graceful failure and logging for debugging purple issue
<     public final String matchSequence() {
<         int start = mMatchSeqIndexStart;
<         int end = mMatchSeqIndexEnd + 1;
< 
<         if (start < 0 || end > FullSequence.length() || start >= end) {
<             // record failure
<             AssemblyStats.incrementInvalidMatchSequence();
<             SV_LOGGER.warn(
<                 "Skipping invalid match sequence bounds: start={}, end={}, seqLength={}",
<                 start, end, FullSequence.length()
<             );
<             return "";  // or null if your caller handles null better
<         }
< 
<         return FullSequence.substring(start, end);
---
>     public final String matchSequence()
>     {
>         return FullSequence.substring(mMatchSeqIndexStart, mMatchSeqIndexEnd + 1); 

 

  1. Add AssemblyStats.java file for tracking incidents
[dmulder@gphost10 assembly]$ readlink -f AssemblyStats.java 
/projects/dmulder_prj/scratch/purple/esvee-v1.1.2/hmftools-esvee-v1.1.2/esvee/src/main/java/com/hartwig/hmftools/esvee/assembly/AssemblyStats.java

[dmulder@gphost10 assembly]$ cat AssemblyStats.java 
package com.hartwig.hmftools.esvee.assembly;


import java.util.concurrent.atomic.AtomicLong;


public final class AssemblyStats
{
    private static final AtomicLong INVALID_MATCH_SEQUENCE = new AtomicLong();


    private AssemblyStats() {}


    public static void incrementInvalidMatchSequence()
    {
        INVALID_MATCH_SEQUENCE.incrementAndGet();
    }


    public static long invalidMatchSequenceCount()
    {
        return INVALID_MATCH_SEQUENCE.get();
    }
} 

 

  1. Modified PhaseSetTask.java to provide a total of incidents
[dmulder@gphost10 phase]$ readlink -f PhaseSetTask.java
/projects/dmulder_prj/scratch/purple/esvee-v1.1.2/hmftools-esvee-v1.1.2/esvee/src/main/java/com/hartwig/hmftools/esvee/assembly/phase/PhaseSetTask.java

[dmulder@gphost10 phase]$ diff PhaseSetTask.java PhaseSetTask.java.backup
12,14d11
< //added for debugging purple issue
< import java.util.concurrent.atomic.AtomicBoolean;
< 
23,25d19
< //added for debugging purple issue
< import com.hartwig.hmftools.esvee.assembly.AssemblyStats;
< 
33,35d26
<     //added for debugging purple issue
<     private static final AtomicBoolean SUMMARY_LOGGED = new AtomicBoolean(false);
< 
110,118d100
<              	 // add this section for purple issue debug
<                  if(SUMMARY_LOGGED.compareAndSet(false, true))
<                  {
<                     SV_LOGGER.info(
<                         "Assembly completed with {} invalid matchSequence events",
<                         AssemblyStats.invalidMatchSequenceCount()
<                     );
<                   }
<                  // end added section for purple issue debug 

 

  1. Modified AssemblyLinker.java file to provide more extensive logging at the higher level where matchSequence is called since the useful objects are not in scope within matchSequence itself.
[dmulder@gphost10 phase]$ readlink -f AssemblyLinker.java
/projects/dmulder_prj/scratch/purple/esvee-v1.1.2/hmftools-esvee-v1.1.2/esvee/src/main/java/com/hartwig/hmftools/esvee/assembly/phase/AssemblyLinker.java

[dmulder@gphost10 phase]$ diff AssemblyLinker.java AssemblyLinker.java.backup
285,300d284
<         // logging for purple issue debug
<         if (firstMatchSequence.isEmpty()) {
<             SV_LOGGER.warn(
<                 "Invalid match sequence event: firstJunction={}, firstPos={}, secondJunction={}, secondPos={}, firstSeqLen={}, secondSeqLen={}, firstSeqBases={}, secondSeqBases={}",
<                 first.junction().coords(),
<                 first.junction().Position,
<                 second.junction().coords(),
<                 second.junction().Position,
<                 firstSeq.FullSequence.length(),
<                 secondSeq.FullSequence.length(),
<                 firstSeq.FullSequence,
<                 secondSeq.FullSequence
<             );
<         }
<         // end purple issue debug 
 

I then rebuilt the esvee v1.1.2 jar with maven.

I had the same issue with v1.2 but had to introduce one more change to address a new bugs

21:00:29.536 [ERROR] failed to process junction(chr1:200487873:-1) with 195 reads
java.lang.ArrayIndexOutOfBoundsException: Index 116 out of bounds for length 116
	at com.hartwig.hmftools.esvee.assembly.ExtReadParseState.currentBase(ExtReadParseState.java:76)
	at com.hartwig.hmftools.esvee.assembly.ExtensionSeqBuilder.buildSequence(ExtensionSeqBuilder.java:263)
	at com.hartwig.hmftools.esvee.assembly.ExtensionSeqBuilder.<init>(ExtensionSeqBuilder.java:131)
	at com.hartwig.hmftools.esvee.assembly.JunctionAssembler.processJunction(JunctionAssembler.java:125)
	at com.hartwig.hmftools.esvee.assembly.JunctionGroupAssembler.processJunctionGroup(JunctionGroupAssembler.java:186)
	at com.hartwig.hmftools.esvee.assembly.JunctionGroupAssembler.run(JunctionGroupAssembler.java:113) 

and

08:48:14.476 [ERROR] failed to process junction(chr1:200487873:-1) with 195 reads
java.lang.ArrayIndexOutOfBoundsException: Index 116 out of bounds for length 116
        at com.hartwig.hmftools.esvee.assembly.ExtReadParseState.currentQual(ExtReadParseState.java:90)
        at com.hartwig.hmftools.esvee.assembly.ExtensionSeqBuilder.buildSequence(ExtensionSeqBuilder.java:264)
        at com.hartwig.hmftools.esvee.assembly.ExtensionSeqBuilder.<init>(ExtensionSeqBuilder.java:131)
        at com.hartwig.hmftools.esvee.assembly.JunctionAssembler.processJunction(JunctionAssembler.java:125)
        at com.hartwig.hmftools.esvee.assembly.JunctionGroupAssembler.processJunctionGroup(JunctionGroupAssembler.java:186)
        at com.hartwig.hmftools.esvee.assembly.JunctionGroupAssembler.run(JunctionGroupAssembler.java:113) 

details of additional changes for v1.2

[dmulder@gphost10 assembly]$ readlink -f ExtReadParseState.java
/projects/dmulder_prj/scratch/purple/esvee-v1.2/hmftools-esvee-v1.2/esvee/src/main/java/com/hartwig/hmftools/esvee/assembly/ExtReadParseState.java

[dmulder@gphost09 assembly]$ diff ExtReadParseState.java ExtReadParseState.java.backup
76,98c76,77
<     // start original code for purple debug
<     //public byte currentBase() { return mRead.getBases()[mCurrentIndex]; }
<     //public byte currentQual() { return mRead.getBaseQuality()[mCurrentIndex]; }
<     // end original code for purple debug
< 
<     // start insertion of purple debug code
<     public byte currentBase()
<     {
<         if(mExhausted || mCurrentIndex < 0 || mCurrentIndex >= mRead.basesLength())
<            return (byte)0;
< 
<         return mRead.getBases()[mCurrentIndex];
<     }
< 
<     public byte currentQual()
<     {
<         if(mExhausted || mCurrentIndex < 0 || mCurrentIndex >= mRead.basesLength())
<            return (byte)0;
< 
<         return mRead.getBaseQuality()[mCurrentIndex];
<     }
<     // end insertion of purple debug code
< 
---
>     public byte currentBase() { return mRead.getBases()[mCurrentIndex]; }
>     public byte currentQual() { return mRead.getBaseQuality()[mCurrentIndex]; } 

Let me know if you are interested or want me to submit a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions