Hi,
We were performing Merqury's hapmer.sh using child HiFi data produced at 150X coverage and parent WGS data produced at 30X coverage each.
At line 168 of plot_spectra_cn.R, the part that removes the initial counts of the k-mer multiplicity histogram corresponding to child-only is hard-coded to 3, which causes xmax to be set too far forward. As a result, the plot appears as follows.
This appears to result in less filtering for high-coverage datasets, and when the value is changed to 5, we can observe that xmax is properly determined as shown below.
To improve this, how about modifying the filtering process to use the count values recorded in cutoffs.txt as the basis for filtering?
Hi,
We were performing Merqury's hapmer.sh using child HiFi data produced at 150X coverage and parent WGS data produced at 30X coverage each.
At line 168 of plot_spectra_cn.R, the part that removes the initial counts of the k-mer multiplicity histogram corresponding to child-only is hard-coded to 3, which causes xmax to be set too far forward. As a result, the plot appears as follows.
This appears to result in less filtering for high-coverage datasets, and when the value is changed to 5, we can observe that xmax is properly determined as shown below.
To improve this, how about modifying the filtering process to use the count values recorded in cutoffs.txt as the basis for filtering?