TachiomIndex¶
TACHIOM index for late-interaction multi-vector retrieval.
Wraps the Rust-backed TACHIOM library (TAC + PQ + HNSW) as a drop-in PyLate index. Token-Aware Clustering groups token embeddings by vocabulary ID before k-means, which improves clustering speed and retrieval quality over standard k-means.
Encode documents with output_value=None to enable Token-Aware Clustering. The returned dict (keys: "token_embeddings", "input_ids", "masks", "attention_mask") carries vocabulary token IDs that add_documents extracts automatically:
embeddings = model.encode(docs, is_query=False, output_value=None) index.add_documents(doc_ids, embeddings)
If documents_token_ids is omitted, all tokens are assigned ID 0 so TAC degrades to a single global k-means. A UserWarning is issued.
Parameters¶
-
index_folder ('str') – defaults to
indexesDirectory that will contain the index sub-folder.
-
index_name ('str') – defaults to
tachiomName of the index sub-folder inside
index_folder. -
override ('bool') – defaults to
FalseDelete and recreate the index directory if it already exists.
-
center_dataset ('bool') – defaults to
TrueSubtract the global mean token vector from all document vectors before building the index. Default:
True. May improve HNSW quality. -
total_centroids ('int | None') – defaults to
NoneTAC coarse-centroid budget.
None(default) auto-computes asmax(2^round(log2(n_tokens/128)), ceil(min_tac_budget * 1.1)), ensuring TAC is used rather than falling back to global k-means. -
tac_n_iter ('int') – defaults to
10K-means iterations for Token-Aware Clustering. Default: 10. Reduce for fast experimentation; raise for maximum quality. 10 is usually enough.
-
tac_micro_threshold ('int | None') – defaults to
NoneToken groups with fewer vectors than this receive 1 centroid each.
None(default) auto-derives as2^round(log2(n_tokens^0.25))clamped to[32, 128]. -
tac_small_threshold ('int | None') – defaults to
NoneToken groups in
[micro, small)receive 2 centroids each.None(default) auto-derives as2 * tac_micro_threshold. -
pq_sample_size ('int') – defaults to
10000000Maximum number of vectors sampled for PQ codebook training. Default: 10,000,000. Safe to lower on small corpora. May be increased on very large datasets (> 1B tokens).
-
pq_n_iter ('int') – defaults to
10K-means iterations for PQ codebook training. Default: 10. Same trade-off as
tac_n_iter. -
normalize ('bool') – defaults to
TrueL2-normalise residuals before PQ encoding. Default:
True. Leave asTrueunless you have a specific reason to disable. -
pq_seed ('int') – defaults to
42Random seed for PQ codebook training. Default: 42. Fix for reproducibility; change to get a different codebook.
-
hnsw_m ('int') – defaults to
32HNSW graph degree (edges per node). Default: 32. Higher = better recall and more memory. Typical range: 16–64.
-
ef_construction ('int') – defaults to
1500HNSW build-time search width. Default: 1500. Purposefully high to maximise recall; up to 1–2M centroids the HNSW build time is negligible compared to TAC and PQ. For very large centroid counts (> 2M) reduce to keep build times reasonable. Hardly gives benefits above 1500.
-
k_centroids ('int') – defaults to
20Coarse centroids probed per query token at search time (
n_probein most IVF-based algorithms). Default: 20. Higher = more candidates, better recall, slower search. -
k_docs_to_score ('int') – defaults to
500Candidate pool size for full late-interaction MaxSim scoring. Default: 500. Must be ≥
k. Alpha pruning may further reduce this pool before MaxSim. Increase for higher recall at the cost of latency. -
ef_search ('int | None') – defaults to
NoneHNSW search-time exploration width.
None(default) resolves toround(1.5 × k_centroids)at search time, keeping the two coupled automatically. Set explicitly only to deviate from the 1.5× rule. -
alpha ('float | None') – defaults to
0.45Coarse-score pruning threshold: candidates whose coarse score falls more than
alpha × score[k]below the k-th best are dropped before MaxSim. Default: 0.45. Range [0, 1]; usually effective in [0, 0.5]. Smaller = more aggressive pruning = faster search but worse recall.Nonedisables pruning (allk_docs_to_scorecandidates scored). -
beta ('int | None') – defaults to
NoneEarly-termination patience: stop MaxSim scoring after this many consecutive non-improving documents.
None= disabled (score all). -
lambda_ ('float | None') – defaults to
NoneHNSW early-exit parameter. Makes search faster; tune together with
ef_search.None= disabled. -
num_threads ('int') – defaults to
0Worker threads for
batch_search. 0 = rayon default (all cores), 1 = single-threaded, n = custom pool of size n.
Methods¶
call
Search the index for the nearest documents to each query.
Parameters
- queries_embeddings ('np.ndarray | torch.Tensor | list[np.ndarray] | list[torch.Tensor]')
- k ('int') – defaults to
10
add_documents
Index a set of documents.
Parameters
- documents_ids ('list[str]')
- documents_embeddings ('list[np.ndarray | torch.Tensor]')
- documents_token_ids ('list[np.ndarray] | None') – defaults to
None - kwargs
get_documents_embeddings
Return approximate token embeddings for the requested documents.
Embeddings are reconstructed from stored PQ codes via approx = coarse_centroid + norm * PQ_residual and are therefore approximate (PQ lossy compression). When the index was built with center_dataset=True (the default), the dataset mean is added back so that the returned embeddings are in the original embedding space.
Parameters
- documents_ids ('list[list[str]]')
Returns
list[list[np.ndarray]]: list[list[np.ndarray]]
remove_documents
References¶
- Martinico et al., "Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing", SIGIR 2026
- TACHIOM GitHub repository
If you use TACHIOM in your research, please cite::
@misc{martinico2026efficientmultivectorretrievaltokenaware,
title={Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing},
author={Silvio Martinico and Franco Maria Nardini and Cosimo Rulli and Rossano Venturini},
year={2026},
eprint={2604.28142},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2604.28142},
}