UMAP.jl

Documentation	Build Status	Test Coverage

A pure Julia implementation of the Uniform Manifold Approximation and Projection dimension reduction algorithm

McInnes, L, Healy, J, Melville, J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ArXiV 1802.03426, 2018

Usage

result = UMAP.fit(data, n_components; n_neighbors, metric, ...) -> UMAP.UMAPResult
result.embedding

The fit function takes two arguments, data (either a column-major matrix or a vector of "points", e.g. vectors), n_components (the number of dimensions in the output embedding), and various keyword arguments. Several important ones are:

n_neighbors: This controls how many neighbors around each point are considered to be part of its local neighborhood. Larger values will result in embeddings that capture more global structure, while smaller values will preserve more local structures.
metric: The distance (semi-)metric to use when calculating distances between points. This can be any subtype of the SemiMetric type from the Distances.jl package, including user-defined types.
min_dist: This controls the minimum spacing of points in the embedding. Larger values will cause points to be more evenly distributed, while smaller values will preserve more local structure.

UMAP.fit returns a UMAPResult struct, with the output embedding at result.embedding.

Using precomputed distances

UMAP can use a precomputed distance matrix instead of finding the nearest neighbors itself. In this case, the distance matrix is passed as data and the metric keyword argument should be :precomputed. Example:

result = UMAP.fit(distances, n_components; metric=:precomputed)

Transforming new data

After embedding a dataset, we can transform new points into the same embedding space via UMAP.transform:

result = UMAP.fit(data, n_component; <kwargs>)

transform_result = UMAP.transform(result, new_data) -> UMAP.UMAPTransformResult
transform_result.embedding

Note that the type of new_data must match the original data exactly. The parameterization used for fit is re-used where appropriate in transform, via the UMAPResult struct.

Examples

The docs have more examples, e.g.

External Resources

Understanding UMAP
For a great description of how UMAP works, see this page from the Python UMAP documentation
If you're familiar with t-SNE, then this page describes UMAP with similar vocabulary to that dimension reduction algorithm

Name		Name	Last commit message	Last commit date
Latest commit History 390 Commits
.github/workflows		.github/workflows
benchmark		benchmark
docs		docs
img		img
src		src
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
HISTORY.md		HISTORY.md
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UMAP.jl

Usage

Using precomputed distances

Transforming new data

Examples

External Resources

About

Uh oh!

Releases 13

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UMAP.jl

Usage

Using precomputed distances

Transforming new data

Examples

External Resources

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages