Skip to content

WARP

WARP index using the xtr-warp-rs backend for high-performance multi-vector search.

Parameters

  • index_folder ('str') – defaults to indexes

    The folder where the index will be stored.

  • index_name ('str') – defaults to warp

    The name of the index.

  • override ('bool') – defaults to False

    Whether to override the collection if it already exists.

  • nbits ('int') – defaults to 4

    The number of bits to use for product quantization. Lower values mean more compression and potentially faster searches but can reduce accuracy.

  • kmeans_niters ('int') – defaults to 4

    The number of iterations for the K-means algorithm used during index creation. This influences the quality of the initial centroid assignments.

  • max_points_per_centroid ('int') – defaults to 256

    The maximum number of points (token embeddings) that can be assigned to a single centroid during K-means. Helps balance the clusters.

  • n_samples_kmeans ('int | None') – defaults to None

    The number of samples to use for K-means clustering. If None, defaults to a value chosen by xtr-warp based on the number of documents.

  • seed ('int') – defaults to 42

    Random seed for K-means reproducibility.

  • use_triton ('bool | None') – defaults to None

    Whether to use Triton kernels when computing K-means. Triton kernels are faster but yield some variance due to race conditions; set to False for 100% reproducible results. If None, uses Triton when available on GPU.

  • min_outliers ('int') – defaults to 50

    Minimum number of outlier embeddings required to trigger centroid expansion during incremental add_documents calls.

  • max_growth_rate ('float') – defaults to 0.1

    Maximum ratio of new centroids relative to the existing codebook size during centroid expansion on incremental adds.

  • n_ivf_probe ('int | None') – defaults to 32

    The number of inverted file list probes to perform during search. This parameter controls the number of clusters to search within the index for each query. Higher values improve recall but increase search time. Same parameter as n_ivf_probe on indexes.PLAID. If None, xtr-warp auto-tunes based on index characteristics.

  • bound ('int | None') – defaults to None

    Number of centroids to consider per query token. If None, auto-tuned.

  • t_prime ('int | None') – defaults to 100000

    Value for the t_prime scoring policy. If None, auto-tuned.

  • max_candidates ('int | None') – defaults to None

    Maximum number of candidate documents to consider before the final sort. If None, auto-tuned.

  • centroid_score_threshold ('float | None') – defaults to None

    Threshold on centroid scores (between 0 and 1) used to prune candidates during search. If None, auto-tuned.

  • batch_size ('int') – defaults to 8192

    The internal batch size used when computing the query × centroids matmul during search.

  • num_threads ('int | None') – defaults to 1

    Upper bound on threads for CPU search. Ignored on CUDA.

  • show_progress ('bool') – defaults to True

    If set to True, a progress bar is displayed during indexing and search operations.

  • device ('str | None') – defaults to None

    Device for computation (e.g. "cpu", "cuda", "cuda:0"). If None, defaults to "cuda" when available, else "cpu".

  • dtype ('torch.dtype') – defaults to torch.float32

    Precision used for centroids and bucket weights when the index is loaded for search (e.g. torch.float32, torch.float16). Affects memory footprint and search speed.

  • mmap ('bool') – defaults to True

    Memory-map large index tensors (codes and residuals) to reduce memory usage. Only supported on CPU.

Methods

call

Query the index for the nearest neighbors of the query embeddings.

Parameters

  • queries_embeddings ('np.ndarray | torch.Tensor | list[np.ndarray] | list[torch.Tensor]')
  • k ('int') – defaults to 10
  • subset ('list[list[str]] | list[str] | None') – defaults to None

Returns

list[list[RerankResult]]: List of lists containing RerankResult with 'id' and 'score' keys.

add_documents

Add documents to the index.

On the first call this creates the WARP index. Subsequent calls use WARP's incremental add which appends documents and may expand the centroid codebook if many new embeddings are outliers.

Parameters

  • documents_ids ('str | list[str]')
  • documents_embeddings ('list[np.ndarray | torch.Tensor]')
  • kwargs
get_documents_embeddings

Get document embeddings by their IDs.

Not supported — WARP stores embeddings in compressed/quantized form.

Parameters

  • document_ids ('list[list[str]]')
remove_documents

Remove documents from the index.

Uses WARP's tombstone deletion followed by an immediate compaction so that disk space is reclaimed and tombstoned passages are physically removed on every call.

Parameters

  • documents_ids ('list[str]')
update_documents

Update document embeddings in-place, preserving passage IDs.

More efficient than delete + add when re-indexing changed documents.

Parameters

  • documents_ids ('list[str]')
  • documents_embeddings ('list[np.ndarray | torch.Tensor]')