| Documentation | Build Status | Test Coverage |
|---|---|---|
A pure Julia implementation of the Uniform Manifold Approximation and Projection dimension reduction algorithm
McInnes, L, Healy, J, Melville, J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ArXiV 1802.03426, 2018
result = UMAP.fit(data, n_components; n_neighbors, metric, ...) -> UMAP.UMAPResult
result.embeddingThe fit function takes two arguments, data (either a column-major matrix or a vector of "points", e.g. vectors), n_components (the number of dimensions in the output embedding), and various keyword arguments. Several important ones are:
n_neighbors: This controls how many neighbors around each point are considered to be part of its local neighborhood. Larger values will result in embeddings that capture more global structure, while smaller values will preserve more local structures.metric: The distance (semi-)metric to use when calculating distances between points. This can be any subtype of theSemiMetrictype from theDistances.jlpackage, including user-defined types.min_dist: This controls the minimum spacing of points in the embedding. Larger values will cause points to be more evenly distributed, while smaller values will preserve more local structure.
UMAP.fit returns a UMAPResult struct, with the output embedding at
result.embedding.
UMAP can use a precomputed distance matrix instead of finding the nearest neighbors itself. In this case, the distance matrix is passed as data and the metric keyword argument should be :precomputed. Example:
result = UMAP.fit(distances, n_components; metric=:precomputed)After embedding a dataset, we can transform new points into the same
embedding space via UMAP.transform:
result = UMAP.fit(data, n_component; <kwargs>)
transform_result = UMAP.transform(result, new_data) -> UMAP.UMAPTransformResult
transform_result.embeddingNote that the type of new_data must match the original data
exactly. The parameterization used for fit is re-used where
appropriate in transform, via the UMAPResult struct.
The docs have more examples, e.g.
- Understanding UMAP
- For a great description of how UMAP works, see this page from the Python UMAP documentation
- If you're familiar with t-SNE, then this page describes UMAP with similar vocabulary to that dimension reduction algorithm