Skip to content

comparison accuracy of "spacedust clustersearch" decreases significantly when the targetdb increases #15

@DUANZIHAOO

Description

@DUANZIHAOO

Hi, I found that when the size of the querydb is much smaller than that of the targetdb, it seems that the performance of Spacedust will decrease significantly.

My 3 faa file:

  1. cluster.faa (a selected set of determined conserved gene clusters, 4 Mb)
  2. target.faa (some selected prokaryotic genomes, 1 Gb)
  3. test_target.faa (subset of target.faa for test, with some conserved gene clusters, 1 Mb)

I set test_target.faa this way to confirm the testing performance of Spacedust.
My code below.

foldseek createdb test_target.faa test_targetDB --prostt5-model weights --threads 4 --gpu 1
spacedust createsetdb test_targetDB test_targetSetDB tmpFolder
Same to cluster.faa and target.faa.
spacedust clustersearch clusterSetDB test_targetSetDB result.tsv tmpFolder --search-mode 2 --num-iterations 2 --threads 4

The result looks great. Some pre-set conserved gene clusters in test_target.faa were detected by spacedust clustersearch.
Then I used the same pipline to run the target.faa.

spacedust clustersearch clusterSetDB targetSetDB result.tsv tmpFolder --search-mode 2 --num-iterations 2 --threads 4

However, the result looks discouraging. Even some gene clusters that were previously compared in subset (test_target) were lost in the entire set (target).

I'm not sure what the reason is. Can you give me some advice? I hope Spacedust continues to improve. Thank!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions