-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Hello Pierre,
I am running CONSENT-correct on a 20x PacBio dataset for a 1Gb genome. The version was cloned from your git repository on the 18th Feb 2021 (i.e. the most current version).
It has yet to complete, but in the process of trying to figure out how close to done it might be, I have been checking the output.
There are 3.2M uncorrected reads in the dataset but over 13M corrected reads so far written to the corrected.fasta. This is not simply a case of reads being split as there are 23Gb of sequence in the input dataset and >82Gb in the output.
I have seen some indications in the issues thread that this behaviour has been seen before but I would value your opinion on if/how I can salvage something from this run (or how to avoid this problem on repeating).
I checked for header uniqueness by sorting output and running uniq and find that the inflation can be explained by this.
Many thanks,
Annabel