WARP¶
WARP index using the xtr-warp-rs backend for high-performance multi-vector search.
Parameters¶
-
index_folder ('str') – defaults to
indexesThe folder where the index will be stored.
-
index_name ('str') – defaults to
warpThe name of the index.
-
override ('bool') – defaults to
FalseWhether to override the collection if it already exists.
-
nbits ('int') – defaults to
4The number of bits to use for product quantization. Lower values mean more compression and potentially faster searches but can reduce accuracy.
-
kmeans_niters ('int') – defaults to
4The number of iterations for the K-means algorithm used during index creation. This influences the quality of the initial centroid assignments.
-
max_points_per_centroid ('int') – defaults to
256The maximum number of points (token embeddings) that can be assigned to a single centroid during K-means. Helps balance the clusters.
-
n_samples_kmeans ('int | None') – defaults to
NoneThe number of samples to use for K-means clustering. If None, defaults to a value chosen by xtr-warp based on the number of documents.
-
seed ('int') – defaults to
42Random seed for K-means reproducibility.
-
use_triton ('bool | None') – defaults to
NoneWhether to use Triton kernels when computing K-means. Triton kernels are faster but yield some variance due to race conditions; set to False for 100% reproducible results. If None, uses Triton when available on GPU.
-
min_outliers ('int') – defaults to
50Minimum number of outlier embeddings required to trigger centroid expansion during incremental
add_documentscalls. -
max_growth_rate ('float') – defaults to
0.1Maximum ratio of new centroids relative to the existing codebook size during centroid expansion on incremental adds.
-
n_ivf_probe ('int | None') – defaults to
32The number of inverted file list probes to perform during search. This parameter controls the number of clusters to search within the index for each query. Higher values improve recall but increase search time. Same parameter as
n_ivf_probeonindexes.PLAID. If None, xtr-warp auto-tunes based on index characteristics. -
bound ('int | None') – defaults to
NoneNumber of centroids to consider per query token. If None, auto-tuned.
-
t_prime ('int | None') – defaults to
100000Value for the t_prime scoring policy. If None, auto-tuned.
-
max_candidates ('int | None') – defaults to
NoneMaximum number of candidate documents to consider before the final sort. If None, auto-tuned.
-
centroid_score_threshold ('float | None') – defaults to
NoneThreshold on centroid scores (between 0 and 1) used to prune candidates during search. If None, auto-tuned.
-
batch_size ('int') – defaults to
8192The internal batch size used when computing the query × centroids matmul during search.
-
num_threads ('int | None') – defaults to
1Upper bound on threads for CPU search. Ignored on CUDA.
-
show_progress ('bool') – defaults to
TrueIf set to True, a progress bar is displayed during indexing and search operations.
-
device ('str | None') – defaults to
NoneDevice for computation (e.g. "cpu", "cuda", "cuda:0"). If None, defaults to "cuda" when available, else "cpu".
-
dtype ('torch.dtype') – defaults to
torch.float32Precision used for centroids and bucket weights when the index is loaded for search (e.g.
torch.float32,torch.float16). Affects memory footprint and search speed. -
mmap ('bool') – defaults to
TrueMemory-map large index tensors (codes and residuals) to reduce memory usage. Only supported on CPU.
Methods¶
call
Query the index for the nearest neighbors of the query embeddings.
Parameters
- queries_embeddings ('np.ndarray | torch.Tensor | list[np.ndarray] | list[torch.Tensor]')
- k ('int') – defaults to
10 - subset ('list[list[str]] | list[str] | None') – defaults to
None
Returns
list[list[RerankResult]]: List of lists containing RerankResult with 'id' and 'score' keys.
add_documents
Add documents to the index.
On the first call this creates the WARP index. Subsequent calls use WARP's incremental add which appends documents and may expand the centroid codebook if many new embeddings are outliers.
Parameters
- documents_ids ('str | list[str]')
- documents_embeddings ('list[np.ndarray | torch.Tensor]')
- kwargs
get_documents_embeddings
Get document embeddings by their IDs.
Not supported — WARP stores embeddings in compressed/quantized form.
Parameters
- document_ids ('list[list[str]]')
remove_documents
Remove documents from the index.
Uses WARP's tombstone deletion followed by an immediate compaction so that disk space is reclaimed and tombstoned passages are physically removed on every call.
Parameters
- documents_ids ('list[str]')
update_documents
Update document embeddings in-place, preserving passage IDs.
More efficient than delete + add when re-indexing changed documents.
Parameters
- documents_ids ('list[str]')
- documents_embeddings ('list[np.ndarray | torch.Tensor]')