PLAID¶
PLAID index. The PLAID index is the most scalable type of index for multi-vector search and leverage PQ-IVF as well as custom kernel for decompression.
Parameters¶
-
index_folder ('str') – defaults to
indexes
The folder where the index will be stored.
-
index_name ('str') – defaults to
colbert
The name of the index.
-
override ('bool') – defaults to
False
Whether to override the collection if it already exists.
-
embedding_size ('int') – defaults to
128
The number of dimensions of the embeddings.
-
nbits ('int') – defaults to
2
The number of bits to use for the quantization.
-
nranks ('int') – defaults to
1
-
kmeans_niters ('int') – defaults to
4
The number of iterations to use for the k-means clustering.
-
index_bsize ('int') – defaults to
1
-
ndocs ('int') – defaults to
8192
The number of candidate documents
-
centroid_score_threshold ('float') – defaults to
0.35
The threshold scores for centroid pruning.
-
ncells ('int') – defaults to
8
The number of cells to consider for search.
-
search_batch_size ('int') – defaults to
262144
The batch size to use when searching.
Methods¶
call
Query the index for the nearest neighbors of the queries embeddings.
Parameters
- queries_embeddings ('np.ndarray | torch.Tensor')
- k ('int') – defaults to
10
add_documents
Add documents to the index.
Parameters
- documents_ids ('str | list[str]')
- documents_embeddings ('list[np.ndarray | torch.Tensor]')
- batch_size ('int') – defaults to
2000
get_documents_embeddings
remove_documents
Remove documents from the index.
Parameters
- documents_ids ('list[str]')