A high-performance columnar storage engine built on Apache Arrow, designed for vector databases and analytical workloads.
┌───────────────────────────────────────────────────────────────┬─────────────┐
│ Application Layer │ Filesystem │
│ (Python / Java / Rust / C++) │ FFI (C ABI) │
└───────────────────────────────────────────────────────────────┴──────┬──────┘
│ │
▼ │
┌─────────────────────────────────────────────────────────────────────────────┐
│ FFI Layer (extern "C") │
│ (Cross-language bindings via C ABI) │
└─────────────────────────────────────────────────────────────────────────────┘
│ │
┌───────────┴───────────┐ │
▼ ▼ │
┌─────────────────────────────────────┐ ┌─────────────────────────────────────┐
│ Writer API │ │ Reader API │
│ ┌───────────────────────────────┐ │ │ ┌───────────────────────────────┐ │
│ │ Column Group Policy │ │ │ │ RecordBatchReader (Scan) │ │
│ │ (Single/Schema/Size Based) │ │ │ │ ChunkReader (Random Access) │ │
│ └───────────────────────────────┘ │ │ │ Take (Row Indices) │ │
└─────────────────────────────────────┘ └─────────────────────────────────────┘
│ │ │
└───────────┬───────────┘ │
▼ │
┌─────────────────────────────────────────────────────────────────────────────┐
│ Transaction Layer │
│ (Manifest Versioning / Conflict Resolution / Delta Logs) │
└─────────────────────────────────────────────────────────────────────────────┘
│ │
┌───────────┴───────────┐ │
▼ ▼ │
┌─────────────────────────────────────┐ ┌─────────────────────────────────────┐
│ Column Group Writer │ │ Column Group Reader │
│ ┌───────────────────────────────┐ │ │ ┌───────────────────────────────┐ │
│ │ Buffer Management │ │ │ │ Chunk Management │ │
│ │ Row Group Sizing │ │ │ │ Column Projection │ │
│ │ File Rolling │ │ │ │ Predicate Pushdown │ │
│ └───────────────────────────────┘ │ │ └───────────────────────────────┘ │
└─────────────────────────────────────┘ └─────────────────────────────────────┘
│ │ │
└───────────┬───────────┘ │
▼ │
┌─────────────────────────────────────────────────────────────────────────────┐
│ Format Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌───────────────────────────┐ │
│ │ Parquet │ │ Vortex │ │ Lance (Read Only) │ │
│ └─────────────┘ └─────────────┘ └───────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ Filesystem Layer │
│ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌────────┐ ┌────────┐ │
│ │ Local │ │AWS S3 │ │ GCS │ │ Azure │ │Aliyun │ │Tencent │ │ Huawei │ │
│ └───────┘ └───────┘ └───────┘ └───────┘ └───────┘ └────────┘ └────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
- Column Group Storage - Organize columns into groups for optimal I/O and compression
- Multi-Format Support - Parquet (primary), Vortex, and Lance formats
- Transaction Support - ACID-like semantics with manifest versioning and conflict resolution
- Cloud Native - Built-in support for major cloud storage providers
- Multi-Language SDKs - Python, Java/Scala, Rust, and C++ bindings
- Encryption & Compression - Data-at-rest encryption and configurable compression
| Provider | Status |
|---|---|
| AWS S3 | Supported (including S3-compatible: MinIO, Cloudflare R2) |
| Google Cloud Storage | Supported |
| Azure Blob Storage | Supported |
| Aliyun OSS | Supported |
| Tencent COS | Supported |
| Huawei Cloud OBS | Supported |
| Language | Status | Notes |
|---|---|---|
| C++ | Primary | Core implementation |
| Python | Supported | FFI bindings with PyArrow integration |
| Java/Scala | Supported | JNI bindings |
| Rust | Supported | DataFusion TableProvider integration |
- CMake >= 3.20.0
- C++17 compiler (GCC 8+, Clang 6+)
- Conan >= 1.60.0 and <= 2.0.0
git clone https://github.com/milvus-io/milvus-storage.git
cd milvus-storage/cpp
# Setup Conan remote
conan remote add default-conan-local https://milvus01.jfrog.io/artifactory/api/conan/default-conan-local --insert 0
# Build
make build
# Test
make test
# Test with minio
make test-all| Option | Description |
|---|---|
BUILD_TYPE=Debug/Release |
Build type |
WITH_AZURE_FS=ON |
Azure filesystem support |
WITH_JNI=ON |
Java JNI bindings |
WITH_PYTHON_BINDING=ON |
Python bindings |
See python/tests/test_write_read.py for Python usage examples.
For old storage(packed interface) integration, see cpp/test/packed/packed_integration_test.cpp.
make fix-format
make fix-tidyContributions are welcome. Please ensure code follows the project style and tests pass.
Apache License 2.0. See LICENSE for details.