-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Description of feature
When the clipping mode is on (trim_msa, trim_ends_only), tha gappy ends of an alignment can be removed, so in the end an initial input sequence such as sequence_name may actually become sequence_name/5-147. The pipeline needs a new local module that would string parse and recalculate the updated sequence coordinates, every time a clipping tools is used. During parsing, there can be two cases; either the sequence does not originally contain any /, so the new start and end must be calculated based on its match against the original sequence, or it was already a slice of a sequence (contains /), so its start and end must be recalculated based on those.
However, this does not make any sense when trim_msa is true and trim_ends_only is false (not advisable to do so), because in this case, gaps in the middle of the sequences (not just at the ends) can also be removed, and then the meaning of the initial sequence range is lost. The logic flow should be controlled accordingly with conditional statements based on these parameters.
To avoid extra calculations mid-execution, instead of calculating the new coords every time after clipping (may happen in different places with future updates), a thought is to calcuate the actual coordinates once, before outputting the final FASTA and MSA files of the pipeline execution. However, we will need to make sure that no intermediate files with wrong coordinates are saved, based on user selected parameters.
To also output the region (/start-end) in the family reps _mqc file.