WARP¶

WARP index using the xtr-warp-rs backend for high-performance multi-vector search.

Parameters¶

index_folder ('str') – defaults to indexes

The folder where the index will be stored.
index_name ('str') – defaults to warp

The name of the index.
override ('bool') – defaults to False

Whether to override the collection if it already exists.
nbits ('int') – defaults to 4

The number of bits to use for product quantization. Lower values mean more compression and potentially faster searches but can reduce accuracy.
kmeans_niters ('int') – defaults to 4

The number of iterations for the K-means algorithm used during index creation. This influences the quality of the initial centroid assignments.
max_points_per_centroid ('int') – defaults to 256

The maximum number of points (token embeddings) that can be assigned to a single centroid during K-means. Helps balance the clusters.
n_samples_kmeans ('int | None') – defaults to None

The number of samples to use for K-means clustering. If None, defaults to a value chosen by xtr-warp based on the number of documents.
seed ('int') – defaults to 42

Random seed for K-means reproducibility.
use_triton ('bool | None') – defaults to None

Whether to use Triton kernels when computing K-means. Triton kernels are faster but yield some variance due to race conditions; set to False for 100% reproducible results. If None, uses Triton when available on GPU.
min_outliers ('int') – defaults to 50

Minimum number of outlier embeddings required to trigger centroid expansion during incremental add_documents calls.
max_growth_rate ('float') – defaults to 0.1

Maximum ratio of new centroids relative to the existing codebook size during centroid expansion on incremental adds.
n_ivf_probe ('int | None') – defaults to 32

The number of inverted file list probes to perform during search. This parameter controls the number of clusters to search within the index for each query. Higher values improve recall but increase search time. Same parameter as n_ivf_probe on indexes.PLAID. If None, xtr-warp auto-tunes based on index characteristics.
bound ('int | None') – defaults to None

Number of centroids to consider per query token. If None, auto-tuned.
t_prime ('int | None') – defaults to 100000

Value for the t_prime scoring policy. If None, auto-tuned.
max_candidates ('int | None') – defaults to None

Maximum number of candidate documents to consider before the final sort. If None, auto-tuned.
centroid_score_threshold ('float | None') – defaults to None

Threshold on centroid scores (between 0 and 1) used to prune candidates during search. If None, auto-tuned.
batch_size ('int') – defaults to 8192

The internal batch size used when computing the query × centroids matmul during search.
num_threads ('int | None') – defaults to 1

Upper bound on threads for CPU search. Ignored on CUDA.
show_progress ('bool') – defaults to True

If set to True, a progress bar is displayed during indexing and search operations.
device ('str | None') – defaults to None

Device for computation (e.g. "cpu", "cuda", "cuda:0"). If None, defaults to "cuda" when available, else "cpu".
dtype ('torch.dtype') – defaults to torch.float32

Precision used for centroids and bucket weights when the index is loaded for search (e.g. torch.float32, torch.float16). Affects memory footprint and search speed.
mmap ('bool') – defaults to True

Memory-map large index tensors (codes and residuals) to reduce memory usage. Only supported on CPU.

Methods¶

call

Query the index for the nearest neighbors of the query embeddings.

Parameters

queries_embeddings ('np.ndarray | torch.Tensor | list[np.ndarray] | list[torch.Tensor]')
k ('int') – defaults to 10
subset ('list[list[str]] | list[str] | None') – defaults to None

Returns

list[list[RerankResult]]: List of lists containing RerankResult with 'id' and 'score' keys.

add_documents

Add documents to the index.

On the first call this creates the WARP index. Subsequent calls use WARP's incremental add which appends documents and may expand the centroid codebook if many new embeddings are outliers.

Parameters

documents_ids ('str | list[str]')
documents_embeddings ('list[np.ndarray | torch.Tensor]')
kwargs

get_documents_embeddings

Get document embeddings by their IDs.

Not supported — WARP stores embeddings in compressed/quantized form.

Parameters

document_ids ('list[list[str]]')

remove_documents

Remove documents from the index.

Uses WARP's tombstone deletion followed by an immediate compaction so that disk space is reclaimed and tombstoned passages are physically removed on every call.

Parameters

documents_ids ('list[str]')

update_documents

Update document embeddings in-place, preserving passage IDs.

More efficient than delete + add when re-indexing changed documents.

Parameters

documents_ids ('list[str]')
documents_embeddings ('list[np.ndarray | torch.Tensor]')