Improvement of xmax determination in plot_spectra_cn on high-depth data

Hi,

We were performing Merqury's hapmer.sh using child HiFi data produced at 150X coverage and parent WGS data produced at 30X coverage each.

At line 168 of plot_spectra_cn.R, the part that removes the initial counts of the k-mer multiplicity histogram corresponding to child-only is hard-coded to 3, which causes xmax to be set too far forward. As a result, the plot appears as follows.

<img width="1800" height="1500" alt="Image" src="https://github.com/user-attachments/assets/7ac6cae1-175c-45c2-ba5d-f3bdad8c125c" />

This appears to result in less filtering for high-coverage datasets, and when the value is changed to 5, we can observe that xmax is properly determined as shown below.

<img width="1800" height="1500" alt="Image" src="https://github.com/user-attachments/assets/2de62b4c-93ef-4f7d-9dec-2d38d1cb6fb2" />

To improve this, how about modifying the filtering process to use the count values recorded in cutoffs.txt as the basis for filtering?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement of xmax determination in plot_spectra_cn on high-depth data #163

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improvement of xmax determination in plot_spectra_cn on high-depth data #163

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions