PLAID¶

PLAID index. The PLAID index is the most scalable type of index for multi-vector search and leverage PQ-IVF as well as custom kernel for decompression.

Parameters¶

index_folder ('str') – defaults to indexes

The folder where the index will be stored.
index_name ('str') – defaults to colbert

The name of the index.
override ('bool') – defaults to False

Whether to override the collection if it already exists.
embedding_size ('int') – defaults to 128

The number of dimensions of the embeddings.
nbits ('int') – defaults to 2

The number of bits to use for the quantization.
nranks ('int') – defaults to 1
kmeans_niters ('int') – defaults to 4

The number of iterations to use for the k-means clustering.
index_bsize ('int') – defaults to 1
ndocs ('int') – defaults to 8192

The number of candidate documents
centroid_score_threshold ('float') – defaults to 0.35

The threshold scores for centroid pruning.
ncells ('int') – defaults to 8

The number of cells to consider for search.
search_batch_size ('int') – defaults to 262144

The batch size to use when searching.

Methods¶

call

Query the index for the nearest neighbors of the queries embeddings.

Parameters

queries_embeddings ('np.ndarray | torch.Tensor')
k ('int') – defaults to 10

add_documents

Add documents to the index.

Parameters

documents_ids ('str | list[str]')
documents_embeddings ('list[np.ndarray | torch.Tensor]')
batch_size ('int') – defaults to 2000

get_documents_embeddings

remove_documents

Remove documents from the index.

Parameters

documents_ids ('list[str]')