Skip to content

Commit 99be4e3

Browse files
authored
Merge pull request #23 from PROBIC/remove-kallisto-support
mSWEEP v2.0.0
2 parents eba257c + 6364ccc commit 99be4e3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+2582
-1844
lines changed

CHANGELOG.md

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
# v2.0.0 (4 July 2023)
2+
First major version increment of mSWEEP (breaks backwards-compatibility).
3+
4+
Output format changes:
5+
- Add the total number of reads to the abundances file (resolves #21)
6+
- Renamed `total_hits` to `num_aligned` in the abundances file (#21)
7+
8+
## New features
9+
- Added an option to evaluate the [mGEMS binning](https://github.com/PROBIC/mGEMS) algorithm from the mSWEEP call with the `--bin-reads` toggle. (https://github.com/PROBIC/mSWEEP/commit/54004d8c2408764dbd970cd1a49c253ede92ce5d)
10+
- Support reading alignments compressed with [alignment-writer](https://github.com/tmaklin/alignment-writer). (https://github.com/PROBIC/mSWEEP/commit/f169fccbf976a0994863e0ffca4a62718778594c)
11+
- Read alignments from cin. (https://github.com/PROBIC/mSWEEP/commit/9878f8ef4d342b72877409001e153ffc0433318a)
12+
13+
## Removed features:
14+
- Matching a fasta file to groups indicators is no longer supported (deprecated options `--fasta`, `--groups-list`, `--groups-delimiter`).
15+
16+
## Pseudoaligner support
17+
- Removed kallisto support ([remove-kallisto-support](https://github.com/PROBIC/mSWEEP/tree/remove-kallisto-support))
18+
- Removed support for Themisto v1.2.0 and older. ([remove-kallisto-support](https://github.com/PROBIC/mSWEEP/tree/remove-kallisto-support))
19+
20+
## Installation
21+
- Added a conda recipe and instructions on installing mSWEEP from bioconda. (#22)
22+
23+
## Build pipeline changes
24+
- Require C++17 to build from source.
25+
- Removed support for building zlib from source (https://github.com/PROBIC/mSWEEP/commit/5c94591b392932efd7b1d16cb6c3e3ddc688c8e1)
26+
- Added the `CMAKE_BUILD_WITH_FLTO` flag for building with link-time optimization (https://github.com/PROBIC/mSWEEP/commit/ee1db015e9902eca05293a416576e89f1d54af7a)
27+
28+
## Internal changes
29+
- Bump C++ standard to C++17.
30+
- Rewrote most of the codebase.
31+
- Fixed dependency versions to avoid conflicts.
32+
33+
34+
# v1.6.3 (1 February 2023)
35+
Fix build issues caused by an update in one of the dependencies.
36+
37+
# v1.6.2 (12 August 2022)
38+
Updated dependencies and bug hunting.
39+
40+
## Bug fixes
41+
- Fix interaction of --no-fit-model with WriteResults.
42+
- Skip erroneously trying to write the probability matrix if --no-fit-model was toggled.
43+
44+
## Updated dependencies
45+
### Use telescope v0.4.0
46+
- About 10x speedup in reading pseudoalignments.
47+
- May reduce the memory footprint on large input.
48+
49+
### Use rcgpar v1.0.2
50+
- Fixes MPI estimation when the input data dimensions exceed the capacity of 32 bit signed integers.
51+
- Enables compilation without MPI support even when MPI headers are present on the system.
52+
53+
## Internal changes
54+
- Rename log.hpp -> msweep_log.hpp and correct the header guard to avoid some conflicts with dependencies.
55+
56+
## Build pipeline
57+
- Use the CMAKE_ENABLE_MPI_SUPPORT flag to compile with or without support for MPI.
58+
59+
60+
# v1.6.1 (5 May 2022)
61+
## Changes:
62+
- Updated dependency bxzstr to v1.1.0.
63+
- Disabled zstd support from bxzstr by default.
64+
- - Support can be enabled when compiling by changing -D ZSTD_FOUND=0 to -D ZSTD_FOUND=1 in config/CMakeLists-bxzstr.txt.in but requires also handling linking in the main CMakeLists.txt file.
65+
66+
67+
# v1.5.2 (20 November 2021)
68+
Added compatibility with the changes to Themisto's command line interface and new index file structure in Themisto v2.0.0.
69+
70+
## Changes
71+
- Changed the --themisto-index argument so that the program will abort with an error telling the user to rerun mSWEEP without --themisto-index if Themisto v2.0.0 index format is detected.
72+
73+
## Documentation
74+
- Updated documentation with usage instructions for both Themisto <=v1.2..0 and >=v2.0.0.
75+
76+
77+
# v1.6.0 (15 November 2021)
78+
November is sometimes in May edition.
79+
80+
## New features
81+
- Added MPI support and instructions for using it.
82+
- Added support for reading in likelihoods written with the --write-likelihoods toggle (resolves #12).
83+
84+
## Changes
85+
- Many internal changes and code refactoring.
86+
- Use a new implementation of the model fitting code from rcgpar, which contains tests, better multiprocessing support, and a distributable (MPI compatible) version of the model fitting code.
87+
88+
## Bugfixes
89+
- Fixed --print-probs so that it always prints to cout like the documentation says.
90+
91+
92+
# v1.5.1 (9 November 2021)
93+
Finally published edition.
94+
95+
New features
96+
- New --version toggle prints the version of the program.
97+
- New --cite toggle prints the citation information for the mSWEEP article in Wellcome Open Research.
98+
99+
Documentation
100+
- Added info about the doi for specific versions of mSWEEP to the readme file.
101+
102+
Build pipeline changes
103+
- Download dependencies that are used by mSWEEP and/or some other dependencies only once and reuse them.
104+
- Download cxxio when building instead of shipping with mSWEEP.
105+
106+
Files restructuring
107+
- Moved config files from the main folder into config/.
108+
109+
Code restructuring
110+
- Renamed main.cpp to mSWEEP.cpp.
111+
- Use functions from dependencies when available instead of copying them to the mSWEEP source code.
112+
113+
114+
# v1.5.0 (15 October 2021)
115+
Fall foliage edition: code restructuring and new features.
116+
117+
## New Features
118+
Options to extract the likelihood matrix that mSWEEP uses internally:
119+
- --write-likelihood: output the likelihood matrix in tab separated matrix format. Will write to a file with the _likelihoods.txt suffix if -o is specified, otherwise the matrix will be emitted to cout.
120+
- --write-likelihood-bitseq: same as above but the output will be in a format that is compatible with BitSeq's estimateExpression and estimateVBExpression programs. Files from this toggle will have the _bitseq_likelihoods.txt suffix.
121+
122+
Added --no-fit-model toggle to skip the relative abundance estimation part:
123+
- --no-fit-model: skip estimating the relative abundances. Useful if only the likelihood matrix is needed.
124+
125+
Support supplying multiple groupings via the -i or --groups-list toggles:
126+
- Several groupings can be supplied by appending them as columns to the argument given by either the -i or the --groups-list options.
127+
- The column delimiter is defined by the --groups-delimiter argument (default: tab-separated.).
128+
- If there are several groupings and output to file is requested, the output will be written to the file specified by the -o argument but with the column index appended. Otherwise the results from all runs will print to cout.
129+
130+
Bugfixes
131+
- Removed the extra line at the end of output when running in bootstrap mode.
132+
133+
Internal changes
134+
- Some code restructuring to make adding new features easier.
135+
- Hopefully improved code readability and a bit of documentation.
136+
- Renamed some variables and functions that used the old "bitfields" naming scheme.
137+
- Resolved some compiler warnings that arose when compiling with -Wall -Wextra -Wpedantic.
138+
- Made several integer types explicit with (u)int32_t style typing.
139+
- The Grouping and Reference structs have been separated and made into proper classes.
140+
141+
142+
# v1.4.0 (10 March 2020)
143+
Beware the clichés of software naming edition.
144+
145+
## New features
146+
- Support parallel processing through the '-t' flags with excellent scaling in larger problems.
147+
- Add possibility to match the input grouping indicators to the fasta file through the '--fasta' and '--groups-list' options.
148+
- Add the '--bootstrap-count' option which allows resampling fewer input alignments than the original sample contains.
149+
- Add possibility to specify the initial random seed for bootstrapping through the '--seed' option.
150+
- Support reading in files compressed with bz2 or lzma if compiled on a machine that supports them.
151+
152+
## Better error checking
153+
- Validate that all input and output files exist and are accessible.
154+
- Add possibility to validate the input grouping indicators when using Themisto pseudoalignments (resolves #4 ).
155+
- Catch errors in several places that escaped in earlier versions.
156+
- More informative error messages in the above-mentioned cases.
157+
158+
## More efficient resource usage
159+
- Parallel proceessing in the RCG optimization using OpenMP.
160+
- Memory usage reduced by ~40% and in large problems.
161+
- Single core performance increased by ~10% in large problems.
162+
163+
## Better build pipeline
164+
- Download dependencies when running cmake.
165+
- Build without OpenMP if it is not supported.
166+
- More aggressive compiler optimization flags.
167+
- Support build and optimization with the Intel C compiler.
168+
169+
## Internal changes
170+
- Improve code structure and legibility.
171+
- Use an external library (telescope) to read in pseeudoalignments from both kallisto or Themisto.
172+
- Better internal storage for the pseudoalignments.
173+
- Change the (rareish) reset step in the RCG optimization to be computationally more expensive but consume significantly less memory.
174+
- Separate bootstrap and regular sample processing classes.
175+
176+
177+
# v1.3.2 (30 January 2020)
178+
Fix working with a grouped Themisto index.
179+
180+
- Add instructions how to use either a grouped or ungrouped index.
181+
- mSWEEP will now not attempt to infer the grouping.
182+
- Instead, everything should be handled by modfying the file supplied with -i.
183+
184+
185+
# v1.3.1 (21 January 2020)
186+
- Fix compilation issues on some systems.
187+
188+
189+
# v1.2.2 (3 September 2019)
190+
Quality-of-life improvements, including:
191+
192+
- Bootstrapping output format is now similar to estimation without.
193+
- Add the number of bootstrap iterations to the output file.
194+
- Print a status indicator when running bootstrapping.
195+
- Internal changes to code structure.
196+
197+
198+
# v1.1.0 (17 December 2018)
199+
## Prepublication edition
200+
This is the version that was used to run experiments in the [mSWEEP preprint (2019)](https://www.biorxiv.org/content/10.1101/332544v2), and the first release to print the version number when ran.

0 commit comments

Comments
 (0)