Hi, we are running the Purple workflow.
Unfortunately for around half of our paired tumour/normal samples the workflow crashes during the Esvee v1.1.2 step. What is happening is that in these samples there is a single low complexity region read trying to be joined with 1 or more right reads from other low complexity regions.
I introduced some logging and protection (we warn instead of crashing when there is an illegal index access attempt) and rebuilt the jar for v1.1.2 and it completed successfully.
15:55:21.711 [INFO ] building phase sets from 40254 phase groups
15:55:37.960 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.960 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr16:33028033:-1, secondPos=33028033, firstSeqLen=53, secondSeqLen=158, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=AAGCCAGGTCGGCAAAAAGCCGCGGTGATGGGGGCAAAAAGCCGCGGCGGCAGGGGCAAAAAACCACAAAAAGCCGCAGCGGCGGGCGCAAAAAGCTGCAACGGTGGGGGCAAAAAGCTGGGGCGGTGGGGGAAAAAGCCGGGGCGACGGGGGCAAAA
15:55:37.960 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.960 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr16:33028033:-1, secondPos=33028033, firstSeqLen=53, secondSeqLen=174, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=TCGACAAANAGCCGTGGCGGCGGGGAAAAAGCCGCGGTGGTGGGGGCAAAAAGCCGCGGCGGCGGGGGCAAAAAACCACAAAAAGCCGCGGCGGCCGGCGCAAAAAGCCGCAACGGTGGGGGCAAAAATCCGGGGCGGTGGGGGAAAAAGCCGGGGCGACGGGGGCAAAAAGCT
15:55:37.961 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.961 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr16:33028067:-1, secondPos=33028067, firstSeqLen=53, secondSeqLen=204, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=CAGCGGTAAAAACCTGTGGGGGCAGGAGCAAAAAGCCGCGGCACGGGGAAAAATCCGCGGGGGGCAAAAAGCCACGGCGGCGGGTGCAAAATGCCGCAACGGTGGGGTCAAAAAGCCTGGGCGGTGGGGGAAAAAGCCGGGGCGACGGGGGCAACAAGGCACGGCGGCGGGGGCAACAAGCCATGGCGGCCGAGGCAAACAACC
15:55:37.961 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.961 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr16:33028156:1, secondPos=33028156, firstSeqLen=53, secondSeqLen=203, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=TTTTTGCCCCCACCGTCGCGGCTTATTGACCACGACGCTGCGGCTTTTTACGGCTTTTTGCCCCCGTCACCGCAGCTTTTTGCCCCCGTCGCCCCGGCTTTTTCCCCCACCGCCCCAGCTTTTTGCCCCCACCGTTGCAGCTTTTTGCGCCCGCCGCTGCGGCTTTTTGTGGTTTTTTGCCCCCGCCGCCGCGGCTTTTTGCC
15:55:37.961 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.961 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr1:125119339:1, secondPos=125119339, firstSeqLen=53, secondSeqLen=150, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=CTGCAGCGGCCGGAGTAAAAAGCCAGGTCGGCAAAAAGCCGCGGTGATGGGGGCAAAAAGCCGCGGCGGCAGGGGCAAAAAACCACAAAAAGCCGCGGCGGCGGGCGCAAAAAGCTGCAACGGTGGGGGCAAAAAGCTGGGGCGGTGGGG
15:55:37.961 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.962 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr7:60956751:1, secondPos=60956751, firstSeqLen=53, secondSeqLen=141, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=CCTGCAGCGGCCGGAGTAAAAAGCCAGGTCGGCAAAAAGCCGCGGTGATGGGGGCAAAAAGCCGCGGCGGCAGGGGCAAAAAACCACAAAAAGCCGCGGCGGCGGGCGCAAAAAGCTGCAACGGTGGGGGCAAAAAGCTGG
15:55:37.962 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.962 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr7:60956833:1, secondPos=60956833, firstSeqLen=53, secondSeqLen=191, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=CGATCTCGATCCCGCCCTCAGACCCGCGGCAGTGGGGGCAAAAAGCTGCGACAGCGGGGTGAAAAAGCCGCAGCGGTAAAAACCTGCAGCGGCCGGAGTAAAAAGCCAGGTCGGCAAAAAGCCGCGGTGATGGGGGCAAAAAGCCGCGGCGGCGGGGGCAAAAAACCACAAAAAGCCGCAGCGGCGGGCGC
15:55:37.962 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.962 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr10:42149930:-1, secondPos=42149930, firstSeqLen=53, secondSeqLen=143, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=GCCGATCCCGCCCTCAGACCCGCGGCGGTGGGGGCAAAAAGCCGCGGTGATGGGGGCAACAAGCCGCGGTGGCGGGGGCAAAAGCCGCGGCGGCGGAGGCAAAAAGCCGTAAAAAGCCGCAGCGTCGGGGGCAAAAAGCCGCG
15:55:37.962 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.962 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr10:42149950:-1, secondPos=42149950, firstSeqLen=53, secondSeqLen=139, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=GCAACAAGCCATGGCGGCCGAGGCAAACAGCTGCGGCGACAAAAAGCTGCGGTGGGGGGAGCAAAAAGCCATGGCGGCGGGGGCAAAAAGCTGTGGTGACGGGGGCGAAAAGCCGTAAAAAGCCACAACATCGGGGGCA
15:55:37.962 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.962 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr10:42149951:-1, secondPos=42149951, firstSeqLen=53, secondSeqLen=158, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=GCAACGGGGGCAACAAGCGGCGGCGGCAGGGGCAACAAGCCGCGGCGGCCGGGGCAAACAGCCGCGGTGACAAAAAGCTGCGGCGGCCGGGGCAAAAAGCTGCGGTGACGGGAGCAAAAAGCTGTAAAAAGCCACAGCGTCGGGGGCAAAAAGCCGCG
15:55:37.962 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.962 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr10:42150022:-1, secondPos=42150022, firstSeqLen=53, secondSeqLen=184, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=AGCCGTAAAAAGCCGCAGCGTCGTGGTCAATAAGCCGCGACGGTGGGGGCAAAAAGCCGCGTTGGCGGGGGTAAGAAGCCGCGGCGGCAAAAAGATGCGGCGGCGCGGCGGCGGGGGCAAAGAGCAGGGGCGGCAAAAAGCCGCGGCGGTAGCGGGGGCCAAAAGCCGCGGCAGGAAAAACCTG
15:55:37.963 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.963 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr10:42149833:-1, secondPos=42149833, firstSeqLen=53, secondSeqLen=183, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=AACCTGCGGCGGCGGGAGTAAAAAGCCGCGTCGACAAAAAGCCGTGGCGGCGGGGAAAAAGCCGCGGCGGCGGGGGCAAAAAACCACAAAAAGCCGCGGCGGCGGGCGCAAAAAGCCGCAACGGTGGGGGCAAAAATCCGGGGCGGTGGGGGAAAAAGCCGGGGCGACGGGGGCAAAAAGCTG
15:55:37.963 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.963 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr2:90360358:-1, secondPos=90360358, firstSeqLen=53, secondSeqLen=152, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=CGCCCTCAGACCCGCGGCGGTGGGGGCAAAAAGCCGCGGTGATGGGGGCAACAAGCCGCGGTGGCGGGGGCAAAAGCCGCGGCGGCGGAGGCAAAAAGCCGTAAAAAGCCGCAGCGTCGGGGGCAAAAAGCCGCGACGGCGGGGGCAAAAAG
15:55:37.963 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.963 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr2:90360419:1, secondPos=90360419, firstSeqLen=53, secondSeqLen=151, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=ACCGGGGCTTTTTTCCCCCACCGCCGCAGGTTTTTCCCGCCCCTGCTTTTTGCCCCCGCCACCGCGGCTTCTTACCCCCGCCACCGCGACTTTTTGCCCCCGCCGTCGCGGCTTTTTGCCCCCGACGCTGCGGCTTTTTACGGCTTTTTGC
15:55:37.963 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.963 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr17:26731431:-1, secondPos=26731431, firstSeqLen=53, secondSeqLen=180, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=CACCGTTGCGGCTTTTTGCGCCCGCCGCCGCGGCTTTTTGTGGTTTTTTGCCCCCGCCGCCGCGGCTTTTTCCCCGCCGCCACGGCTTTTTGTCGACGCGGCTTTTTACTCCCGCCGCCGCAGGTTTTTACCGCTGCGGCTTTTTAACCCCGCCGTCGCAGCTTTTTGCCCCTACCGCCG
15:55:37.963 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.964 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr17:26731468:-1, secondPos=26731468, firstSeqLen=53, secondSeqLen=217, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=AGGCTTTTTGACCCCACCGTTGCGGCATTTTGCACCCGCCGCCGTGGCTTTTTGCCCCCCGCGGATTTTTCCCCGTGCCGCGGCTTTTTGCTCCTGCCCCCACAGGTTTTTACCGCTGCGGCTTTTTCTCCCCGCCGTTGCGGGTTTTTCCCCCCACGACCGCGGCTTTTTGCCCCCACCGCCGCGGGTCTGAGGGCGGGATCGGCAAACTCGGCTG
15:55:37.964 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.964 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr17:26731655:-1, secondPos=26731655, firstSeqLen=53, secondSeqLen=557, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=CTCTACCGGCGTCCTGGCTAAGGCAGCGCCGAGGGGCGCTCCTTGTCCAGCTCTCCTGGTTCGGGCGTTCCTTGCCTAGACGCTGGCGCCCCGGGCTCTGCTCCTGGGCCGCTGCAGCCTGCATAGAGCAGCGCTGCGCTCGGCCGCGTTGGGAGAGAAGAAGGAGGGCGGTGGCGGGGGTGACGCGGCTATCGCGGAGGGAGGCTCACGGGCCGCGGCCAGCCAGGTGCTGCAGCAGTGCGGGCAGCTCCAGAAGCTCATCAGCATCTCTGTTGGCAGCCTGCGCGGGCTGCGCACCATGTGCGCTGTGTCCAAGGACCTCACCCAGCAGGAGATACGGACCATGGAGGTAAGGGGGTCAGGGACAAGGGCTGGGCTCCCGCACCGGACTGGACATCTCCCTCGGGGCCCCAGTTCACTCCTGGCCGAGTTGCGTCCTTGAGCCCGCGTCGCCTCCCTGGAGGCTTCTCCTCCATCCTGCACTCGCTGATGCGGCAGCCAGAGGACCCGGGACCAGCCCTCACCTTGGGCAGGATTTGTGGGGCGGGTGCGTGTTG
15:55:37.964 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.964 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr17:26732212:1, secondPos=26732212, firstSeqLen=53, secondSeqLen=173, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=ATGGAAAATCGGGGGGTAAGGGGATGTCCGCGCGCAGCCCACCCCGCCCACGGGACCCTCGAGCCTCCATCACAGTTCCCAACACGCACCCGCCCCACAAATCCTGCCCAAGGTGAGGGCTGGTCCCGGGTCCTCTGGCTGCCGCATCAGCGAGTGCAGGAGGGAGGAGAAGC
15:55:37.964 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.964 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr2:91442426:1, secondPos=91442426, firstSeqLen=53, secondSeqLen=234, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=GCAAAAAGCCACAAAAAGCCGGGGCGGCGGGGTAAGAAGCCGCGGCGGCGTGGCAAGAGGCCGGCGGCGGGGCAAGAGGCCGCGGCGGCGGGGCAAGAGGCCGCGGCGGCGGGGCAAGAGGCCGCGGCGGGAAAAACCTGCGGCGGCGGGGGCGAAAAGCAGTAAAAAGCCGCGGCGCCGGGGGCCAAAAGCCATAAAAAGTCGCGGTGGCGGTGACAAAAAGCCGCTGCGGAA
15:55:37.964 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.964 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr2:91442426:1, secondPos=91442426, firstSeqLen=53, secondSeqLen=226, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=GGGTAAGAAGCCGCGGCGGCGTGGCAAGAGGCCGGCGGCGGGGCAAGAGGCCGCGGCGGCGGGGCAAGAGGCCGCGGCGGCGGGGCAAGAGGCCGCGGCGGCGGGGCAAGAGGCCGCGGCGGGAAAAACCTGCGGCGGCGGGGGCGAAAAGCAGTAAAAAGCCGCGGCGCCGGGGGCCAAAAGCCATAAAAAGTCGCGGTGGCGGTGACAAAAAGCCGCTGCGGAA
15:55:37.964 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.964 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr2:91442456:1, secondPos=91442456, firstSeqLen=53, secondSeqLen=231, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=AGGAAGCCGCGCAGGGGGCAAGGAGCCACGGCGGCGGGGGCAAAACGCTGCTGTGGCGGGGCAAATAGAAGCAAAAAGCCGCGGCGGCGTGGGCAAAAAGCTGCAAAAAGCCCGGGCGGCGGGGCAAGAAGCCGCGGCGGGAAAAACCTGCGGCGGCACGGGCGAAAAGCAGTAAAAAACCGCGGCGCCGGGGGCCAAAAGCCATAAAAAGCCTCGGCGGCGGTGGCAAAA
15:55:37.965 [WARN ] Skipping invalid match sequence bounds: start=-47, end=53, seqLength=53
15:55:37.965 [WARN ] Invalid match sequence event: firstJunction=chr16:36101701:1, firstPos=36101701, secondJunction=chr16:34014974:1, secondPos=34014974, firstSeqLen=53, secondSeqLen=102, firstSeqBases=CAAAAAGCCGCGACGGCGGGGGCAAAAAGTCGCGGTGGCGGGGGTAAGAAGCC, secondSeqBases=AAAAAGCCGCGGCAGCGGGGAAAAAGCCGCTGTGATGGGGGAAAGAAGCCGCGGCAGCGGGGGCAAAAAGCCACAAAAAGCCGCTGCGGCGGGCGCAAAAAG
15:56:03.515 [INFO ] Assembly completed with 22 invalid matchSequence events
16:12:56.321 [INFO ] created 34304 phase sets, remote reads extracted(106508)
[dmulder@gphost10 phase]$ readlink -f AssemblyLinker.java
/projects/dmulder_prj/scratch/purple/esvee-v1.1.2/hmftools-esvee-v1.1.2/esvee/src/main/java/com/hartwig/hmftools/esvee/assembly/phase/AssemblyLinker.java
[dmulder@gphost10 phase]$ diff AssemblyLinker.java AssemblyLinker.java.backup
285,300d284
< // logging for purple issue debug
< if (firstMatchSequence.isEmpty()) {
< SV_LOGGER.warn(
< "Invalid match sequence event: firstJunction={}, firstPos={}, secondJunction={}, secondPos={}, firstSeqLen={}, secondSeqLen={}, firstSeqBases={}, secondSeqBases={}",
< first.junction().coords(),
< first.junction().Position,
< second.junction().coords(),
< second.junction().Position,
< firstSeq.FullSequence.length(),
< secondSeq.FullSequence.length(),
< firstSeq.FullSequence,
< secondSeq.FullSequence
< );
< }
< // end purple issue debug
I then rebuilt the esvee v1.1.2 jar with maven.
I had the same issue with v1.2 but had to introduce one more change to address a new bugs
Let me know if you are interested or want me to submit a PR.
Hi, we are running the Purple workflow.
You can see the parameters used to run it here: #778
Unfortunately for around half of our paired tumour/normal samples the workflow crashes during the Esvee v1.1.2 step. What is happening is that in these samples there is a single low complexity region read trying to be joined with 1 or more right reads from other low complexity regions.
I introduced some logging and protection (we warn instead of crashing when there is an illegal index access attempt) and rebuilt the jar for v1.1.2 and it completed successfully.
details of my changes:
I then rebuilt the esvee v1.1.2 jar with maven.
I had the same issue with v1.2 but had to introduce one more change to address a new bugs
and
details of additional changes for v1.2
Let me know if you are interested or want me to submit a PR.