- fix potentially incorrect generation of vocabulary map files on 32-bit systems (this appears to have only impacted non-default block sizes)
- fix calculation of average precision in
ir_eval(the denominator was incorrect) - specify that labels are required for the
file_corpusdocument list; this allows spaces in the path to each document
- additions to the graph library:
- myopic search
- BFS
- preferential attachment graph generation model (supports node attractiveness from different distributions)
- betweenness centrality
- eigenvector centrality
- added a new natural language parsing library:
- parse tree library (visitor-based)
- shift-reduce constituency parser for generating phrase structure trees
- reimplementation of evalb metrics for evaluating parsers
- new filter for Penn Treebank-style normalization
- added a greedy averaged Perceptron-based tagger
- demo application for various basic text processing (profile)
- basic iostreams that support gzip compression (if compiled with ZLib support)
- added iteration method for
stats::multinomialseen events - added expected value and entropy functions to
statsnamespace - added
linear_model: a generic multiclass classifier storage class - added
gz_corpus: a compressed version ofline_corpus - added macros for generating type safe identifiers with user defined literal suffixes
- added a persistent stack data structure to
meta::util
- added operator== for
util::optional<T> - better CMake support for building the libsvm modules
- better CMake support for downloading unit-test data
- improved setup guide in README (for OS X, Ubuntu, Arch, and EWS/ENGRIT)
- tree analyzers refactored to use the new parser library (removes dependency on outside toolkits for generating tree files)
- analyzers that are not part of the "core" have been moved into their
respective folders (so
ngram_pos_analyzeris insrc/sequence,tree_analyzeris insrc/parser) make_indexnow checks if the files exist before loading an index, and if they are missing creates a new one (as opposed to just throwing an exception on a nonexistent file)- cpptoml upgraded to support TOML v0.4.0
- enable extra warnings (-Wextra) for clang++ and g++
- fix
sequence_analyzer::analyze() constwhen applied to untagged sequences (was throwing when it shouldn't) - ensure that the inverted index object is destroyed first before
uninverting occurs in the creation of a
forward_idnex - fix bug where
icu_tokenizerwould output spaces as tokens - fix bugs where index objects were not destroyed before trying to delete their files in the unit tests
- fix bug in
sparse_vector::find()where it would return a non-end iterator when asked to find an element that does not exist
- demo application for CRF-based POS tagging
nearest_centroidclassifier- basic statistics library for representing relevant probability distributions
sparse_vectorutility class
ngram_pos_analyzernow uses the CRf internally (see issue #46)knnclassifier new supports weighted knnfilesystem::copy_file()no longer hangs without progress reporting with large files- CMake build system now includes
INTERFACEtargets (better inclusion as a subproject in external projects) - MeTA can now (optionally) be built with C++14 support
language_model_rankerscoring function corrected (see issue #50)naive_bayesclassifier scoring corrected- several incorrect instances of
numeric_limits<double>::min()replaced with the intendednumeric_limits<double>::lowest() - fix compilation with versions of ICU < 4.4
- sequence analyzer and CRF implementation
- basic language model
- basic directed and undirected graphs
- restructure CMakeLists
- Initial release.