Skip to content

Commit c08b363

Browse files
jkbonfieldwhitwham
authored andcommitted
Add tests for CRAM TLEN generation.
We convert from previously built CRAM files that don't use the detached flag to SAM files, validating all the auto-TLEN generation code. Note however some of this code is arbitrary, so the SAM files are not the only valid decoded versions of these CRAM files. The tests here are primarily for regression testing purposes, so we don't accidentally change behaviour without realising. If we subsequently improve our methods of resolving ties (especially around triplets) then we would edit the SAM files too. Note to build the CRAM files from the SAM files I disabled the "detached" enabling part of the CRAM encoder with this patch: diff --git a/cram/cram_encode.c b/cram/cram_encode.c index 6fe797a..2ede0359 100644 --- a/cram/cram_encode.c +++ b/cram/cram_encode.c @@ -3881,6 +3881,7 @@ static int process_one_read(cram_fd *fd, cram_container *c, } } + detached: /* * The fields below are unused when encoding this read as it is * no longer detached. In theory they may get referred to when @@ -3940,7 +3941,7 @@ static int process_one_read(cram_fd *fd, cram_container *c, kh_val(s->pair[sec], k) = rnum; } else { - detached: + detached_: //fprintf(stderr, "unpaired\n"); /* Derive mate flags from this flag */
1 parent 4ec8b74 commit c08b363

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+393
-0
lines changed

Makefile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -699,6 +699,7 @@ check test: all $(HTSCODECS_TEST_TARGETS)
699699
cd test/mpileup && ./test-pileup.sh mpileup.tst
700700
cd test/fastq && ./test-fastq.sh
701701
cd test/base_mods && ./base-mods.sh base-mods.tst
702+
cd test/tlen && ./tlen.sh tlen.tst
702703
REF_PATH=: test/sam test/ce.fa test/faidx/faidx.fa test/faidx/fastqs.fq
703704
test/test-regidx
704705
cd test && \

test/tlen/README

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
ref AAAAACCCCCGGGGGTTTTT
2+
12345678901234567890
3+
4+
Test files for read pairs.
5+
6+
Starting letter:
7+
a = both start/end differ
8+
b = only ends differ
9+
c = only start differ
10+
d = both start/end match
11+
12+
Each file has two versions (eg a7 and a7b) which are identical except
13+
for read ordering with the entries swapping lines. (This may make some
14+
data unsorted.)
15+
16+
We also have neighbouring pairs (a7/a8, a9/a10) which are the same
17+
coordinates, but flipping the orientation of READ1 and READ2.
18+
Eg pointing inwards vs pointing outwards.
19+
20+
21+
22+
Starts and ends differ:
23+
24+
Combinations: (2) R1 before R2?
25+
(2) R1 starts before R2?
26+
(2) R1 ends before R2?
27+
(2) R1 top, R2 bottom (vs vice versa)?
28+
=> 16
29+
30+
a7 a8
31+
1---> <---1
32+
<---2 2--->
33+
34+
2---> <---2
35+
<---1 1--->
36+
37+
a7b a8b
38+
<---2 2--->
39+
1---> <---1
40+
41+
<---1 1--->
42+
2---> <---2
43+
44+
a9 a10
45+
1------> <------1
46+
<--2 2-->
47+
48+
2------> <------2
49+
<--1 1-->
50+
51+
a9b a10b
52+
<--2 2-->
53+
1------> <------1
54+
55+
<--1 1-->
56+
2------> <------2
57+
58+
59+
60+
Starts match, ends differ:
61+
62+
b7 b8
63+
1---> <---1
64+
<------2 2------>
65+
66+
2---> <---2
67+
<------1 1------>
68+
69+
b7b b8b
70+
<------2 2------>
71+
1---> <---1
72+
73+
<------1 1------>
74+
2---> <---2
75+
76+
Starts differ, ends match:
77+
78+
c7 c8
79+
1------> <------1
80+
<---2 2--->
81+
82+
2------> <------2
83+
<---1 1--->
84+
85+
c7b c8b
86+
<---2 2--->
87+
1------> <------1
88+
89+
<---1 1--->
90+
2------> <------2
91+
92+
Starts and ends both match:
93+
94+
d7
95+
1------>
96+
<------2
97+
98+
2------>
99+
<------1
100+
101+
d7b
102+
<------2
103+
1------>
104+
105+
<------1
106+
2------>
107+
108+
-----------------------------------------------------------------------------
109+
Test files for read triplets
110+
111+
d4 d5
112+
1-----> <-----1
113+
<-----m <-----m
114+
<-----2 2----->
115+
116+
d4b d5b
117+
1-----> <-----1
118+
<-----2 2----->
119+
<-----m <-----m
120+
121+
d4c d5c
122+
<-----2 2----->
123+
<-----m <-----m
124+
1-----> <-----1
125+
126+
d4d d5d
127+
<-----2 2----->
128+
1-----> <-----1
129+
<-----m <-----m
130+
131+
d4e d5e
132+
<-----m <-----m
133+
1-----> <-----1
134+
<-----2 2----->
135+
136+
d4f d5f
137+
<-----m <-----m
138+
<-----2 2----->
139+
1-----> <-----1
140+
141+
142+
a4 a5
143+
1---> m--->
144+
m---> 1--->
145+
<---2 <---2
146+

test/tlen/a4.cram

517 Bytes
Binary file not shown.

test/tlen/a4.sam

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
@SQ SN:ref LN:20
2+
seq 65 ref 3 0 7M = 6 13 AAACCCC IIIIIII
3+
seq 225 ref 6 0 7M = 9 -13 CCCCCGG IIIIIII
4+
seq 145 ref 9 0 7M = 3 -13 CCGGGGG IIIIIII

test/tlen/a5.cram

517 Bytes
Binary file not shown.

test/tlen/a5.sam

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
@SQ SN:ref LN:20
2+
seq 193 ref 3 0 7M = 6 13 AAACCCC IIIIIII
3+
seq 97 ref 6 0 7M = 9 -13 CCCCCGG IIIIIII
4+
seq 145 ref 9 0 7M = 3 -13 CCGGGGG IIIIIII

test/tlen/a7.cram

534 Bytes
Binary file not shown.

test/tlen/a7.sam

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
@SQ SN:ref LN:20
2+
seq 97 ref 3 0 11M = 6 14 AACCCCCGGGG IIIIIIIIIII
3+
seq 145 ref 6 0 11M = 3 -14 CCCCGGGGGTT IIIIIIIIIII
4+
seqr 161 ref 3 0 11M = 6 14 AACCCCCGGGG IIIIIIIIIII
5+
seqr 81 ref 6 0 11M = 3 -14 CCCCGGGGGTT IIIIIIIIIII

test/tlen/a7b.cram

533 Bytes
Binary file not shown.

test/tlen/a7b.sam

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
@SQ SN:ref LN:20
2+
seq 145 ref 6 0 11M = 3 -14 CCCCGGGGGTT IIIIIIIIIII
3+
seq 97 ref 3 0 11M = 6 14 AACCCCCGGGG IIIIIIIIIII
4+
seqr 81 ref 6 0 11M = 3 -14 CCCCGGGGGTT IIIIIIIIIII
5+
seqr 161 ref 3 0 11M = 6 14 AACCCCCGGGG IIIIIIIIIII

0 commit comments

Comments
 (0)