Skip to content

[Bug?] Index retrieval is not self-consistent. #176

@ysig

Description

@ysig

Hi,

I had very bad scores using the autofaiss build_index method and except that the bug could have been caused from the fact that when you build_index you don't save the fname that corresponds to each index ... I also noticed something utterly strange. In particular, the index is not self-consistent. That means if I use the same set of embeddings that I used to build an index to do retrieval I don't always get the same point (yet the reconstruction error is zero when building the index).

import numpy as np
from autofaiss import build_index

data_path = 'images'
features_dir = 'features'

files = [f for f in os.listdir(data_path)]
full = [join(path, f) for f in os.listdir(data_path)]
index = build_index(embeddings=np.concatenate([np.load(f) for f in full], axis=0), nb_cores=12, save_on_disk=False)[0]

how_many = 0
for f in os.listdir(features_dir):
    query_vector = np.load(join(features_dir, f))
    distances, indices = index.search(query_vector, 1)
    distances, indices = np.squeeze(distances), np.squeeze(indices)
    how_many += int(files[indices[0]] != f)

print(which_many)

Which outputs for my data around 178 (out of 1000000 points).

Is this a bug - or do I need different input parameters to make this work properly?

I use faiss==1.7.4 and autofaiss==2.15.8.

Thank you,
ysig

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions