Skip to content

Commit e11f445

Browse files
committed
Document --min-hits.
1 parent d7dee7b commit e11f445

File tree

1 file changed

+10
-0
lines changed

1 file changed

+10
-0
lines changed

README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ i.e. the file format is automatically detected (alignment-writer v0.4.0 and newe
9292
We recommend running [demix\_check](https://github.com/tmaklin/coreutils_demix_check) on the binned reads and/or [checkm](https://github.com/Ecogenomics/CheckM) on the bin-assembled genomes (BAGs) to evaluate the accuracy of the results.
9393

9494
## Working with large alignment files
95+
### Compressing Themisto output files
9596
For complex input data with many organisms, the pseudoalignment files from Themisto can get infeasibly large. In these cases, [alignment-writer](https://github.com/tmaklin/alignment-writer) can be used to compress the alignment files to <10% of the original size.
9697

9798
mSWEEP >=v2.0.0 can read the compressed alignments in directly by running
@@ -100,6 +101,15 @@ mSWEEP --themisto-1 fwd_compressed.aln --themisto-2 rev_compressed.aln -i cluste
100101
101102
```
102103

104+
### Running estimation on large sparse alignments
105+
If the target alignment is sparse, meaning that there are target groups which have few/no reads aligning against them in the whole sample, mSWEEP can be instructed to ignore these in the estimation by adding the `--min-hits 1` flag:
106+
```
107+
mSWEEP --themisto sparse.aln -i clustering.txt -t 2 --min-hits 1
108+
```
109+
This will reduce the runtime and memory use of the estimation proportional to how many target groups are removed. Using `--min-hits 1` does not affect the results beyond differences in computational accuracy.
110+
111+
The `--min-hits` flag also accepts values higher than 1 for pruning target groups with a small number of aligned reads. Using a value higher than 1 will change the resulting values.
112+
103113
## (experimental) Reliability of abundance estimates
104114
Add the `--run-rate` flag to calculate a relative reliability value for each abundance estimate using a variation of the [RATE method](https://doi.org/10.1214/18-AOAS1222)
105115
```

0 commit comments

Comments
 (0)