-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Hi, I found that when the size of the querydb is much smaller than that of the targetdb, it seems that the performance of Spacedust will decrease significantly.
My 3 faa file:
- cluster.faa (a selected set of determined conserved gene clusters, 4 Mb)
- target.faa (some selected prokaryotic genomes, 1 Gb)
- test_target.faa (subset of target.faa for test, with some conserved gene clusters, 1 Mb)
I set test_target.faa this way to confirm the testing performance of Spacedust.
My code below.
foldseek createdb test_target.faa test_targetDB --prostt5-model weights --threads 4 --gpu 1
spacedust createsetdb test_targetDB test_targetSetDB tmpFolder
Same to cluster.faa and target.faa.
spacedust clustersearch clusterSetDB test_targetSetDB result.tsv tmpFolder --search-mode 2 --num-iterations 2 --threads 4
The result looks great. Some pre-set conserved gene clusters in test_target.faa were detected by spacedust clustersearch.
Then I used the same pipline to run the target.faa.
spacedust clustersearch clusterSetDB targetSetDB result.tsv tmpFolder --search-mode 2 --num-iterations 2 --threads 4
However, the result looks discouraging. Even some gene clusters that were previously compared in subset (test_target) were lost in the entire set (target).
I'm not sure what the reason is. Can you give me some advice? I hope Spacedust continues to improve. Thank!