Skip to content

PLAID

PLAID index. The PLAID index is the most scalable type of index for multi-vector search and leverage PQ-IVF as well as custom kernel for decompression.

Parameters

  • index_folder ('str') – defaults to indexes

    The folder where the index will be stored.

  • index_name ('str') – defaults to colbert

    The name of the index.

  • override ('bool') – defaults to False

    Whether to override the collection if it already exists.

  • embedding_size ('int') – defaults to 128

    The number of dimensions of the embeddings.

  • nbits ('int') – defaults to 2

    The number of bits to use for the quantization.

  • nranks ('int') – defaults to 1

  • kmeans_niters ('int') – defaults to 4

    The number of iterations to use for the k-means clustering.

  • index_bsize ('int') – defaults to 1

  • ndocs ('int') – defaults to 8192

    The number of candidate documents

  • centroid_score_threshold ('float') – defaults to 0.35

    The threshold scores for centroid pruning.

  • ncells ('int') – defaults to 8

    The number of cells to consider for search.

  • search_batch_size ('int') – defaults to 262144

    The batch size to use when searching.

Methods

call

Query the index for the nearest neighbors of the queries embeddings.

Parameters

  • queries_embeddings ('np.ndarray | torch.Tensor')
  • k ('int') – defaults to 10
add_documents

Add documents to the index.

Parameters

  • documents_ids ('str | list[str]')
  • documents_embeddings ('list[np.ndarray | torch.Tensor]')
  • batch_size ('int') – defaults to 2000
get_documents_embeddings
remove_documents

Remove documents from the index.

Parameters

  • documents_ids ('list[str]')