feat: Reduce kmer spacing for short sequences#1242
Open
corneliusroemer wants to merge 1 commit intomasterfrom
Open
feat: Reduce kmer spacing for short sequences#1242corneliusroemer wants to merge 1 commit intomasterfrom
corneliusroemer wants to merge 1 commit intomasterfrom
Conversation
This is a good heuristic to avoid lack of seed matching in short sequences If default kmer spacing is 50, a sequence of 180bp gets only 3 kmers which is not robust
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
93f31a9 to
edefa32
Compare
27d4a31 to
9010d85
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a good heuristic to avoid lack of seed matching in short sequences (~1% of RSV A is only ~200bp long)
If default kmer spacing is 50, a sequence of 180bp gets only 3 kmers which is not robust
I put the parameter adjustment next to the other short-sequence heuristic but this means we have to make params mutable. Might be better to bury the parameter adjustment further up or down the stack. Thoughts @ivan-aksamentov?
We may or may not want to expose the const
MIN_KMER_NUMas a CLI arg. I don't think it's necessary for user to adjust so we can get away with hard coding it for now, I think.See https://neherlab.slack.com/archives/C015PFP5V44/p1693408681322229