GitHub - datavorous/spheni: Lightweight Vector Search Library focused on Memory Efficiency

Spheni

A lightweight vector search library focused on memory-efficient approximate nearest neighbor search.

Index

Overview
Features
Getting Started
API Reference
Benchmarks
Roadmap
References

Overview

The goal of Spheni is to be focused on memory efficiency rather than accuracy, and act a candidate generation system.

It implements inverted indexing and product quantization with residual encoding to reduce memory usage while supporting fast similarity search.

With Cohere 1M Embeddings, the current benchmark reports up to 213.9x compression (13.69 MB index at M=8) and up to 94.20% Recall@10-in-100 (M=64, nprobe=32), with the baseline at 80.1x compression and 83.25% Recall@10-in-100 (M=32, nprobe=32).

Features

Indexes: Flat, IVF, FlatPQ, IVF-PQ
Metrics: Cosine similarity, L2 distance
Operations: train, add, search

Getting Started

Build

Requirements:

CMake >= 3.15
A C++20 compiler (GCC/Clang)

Quick build

chmod +x build.sh
./build.sh

After building, this repository produces build/libspheni.a. You only need the public header (include/spheni.h) and the static library (libspheni.a) to consume Spheni in another project.

Usage

g++ -std=c++20 -O3 main.cpp -I /path/to/spheni/include /path/to/libspheni.a -o main

Example

Check out examples/ folder for more.

#include "spheni.h"
#include <iostream>

int main() {
        // define
        spheni::IVFSpec spec{{3, spheni::Metric::Cosine, true}, 2, 1};
        spheni::IVFIndex index(spec);

        long long ids[] = {0, 1, 2};
        float vecs[] = {
            1.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f, 1.0f, 1.0f, 0.0f,
        };
        // train
        index.train(ids, vecs);

        float query[] = {1.0f, 0.2f, 0.0f};
        // search
        auto hits = index.search(query, 3);

        for (const auto &h : hits) {
                std::cout << h.id << " " << h.score << "\n";
        }
        return 0;
}

API Reference

Detailed public API documentation is available in docs/api-reference.md.

Benchmarks

Current Benchmark Report (single-core run, 200 queries, Recall@k-in-100).
Legacy Report is also available.

Roadmap

Implement save/load for seralized data
Implement multithreading with OpenMP wherever applicable
Implement OPQ (tough for me)
SIMD vectorizations

References

Papers

Blogs

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
docs		docs
examples		examples
include		include
media		media
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spheni

Index

Overview

Features

Getting Started

Build

Example

API Reference

Benchmarks

Roadmap

References

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spheni

Index

Overview

Features

Getting Started

Build

Example

API Reference

Benchmarks

Roadmap

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages