Skip to content

TachiomIndex

TACHIOM index for late-interaction multi-vector retrieval.

Wraps the Rust-backed TACHIOM library (TAC + PQ + HNSW) as a drop-in PyLate index. Token-Aware Clustering groups token embeddings by vocabulary ID before k-means, which improves clustering speed and retrieval quality over standard k-means.

Encode documents with output_value=None to enable Token-Aware Clustering. The returned dict (keys: "token_embeddings", "input_ids", "masks", "attention_mask") carries vocabulary token IDs that add_documents extracts automatically:

embeddings = model.encode(docs, is_query=False, output_value=None)     index.add_documents(doc_ids, embeddings)

If documents_token_ids is omitted, all tokens are assigned ID 0 so TAC degrades to a single global k-means. A UserWarning is issued.

Parameters

  • index_folder ('str') – defaults to indexes

    Directory that will contain the index sub-folder.

  • index_name ('str') – defaults to tachiom

    Name of the index sub-folder inside index_folder.

  • override ('bool') – defaults to False

    Delete and recreate the index directory if it already exists.

  • center_dataset ('bool') – defaults to True

    Subtract the global mean token vector from all document vectors before building the index. Default: True. May improve HNSW quality.

  • total_centroids ('int | None') – defaults to None

    TAC coarse-centroid budget. None (default) auto-computes as max(2^round(log2(n_tokens/128)), ceil(min_tac_budget * 1.1)), ensuring TAC is used rather than falling back to global k-means.

  • tac_n_iter ('int') – defaults to 10

    K-means iterations for Token-Aware Clustering. Default: 10. Reduce for fast experimentation; raise for maximum quality. 10 is usually enough.

  • tac_micro_threshold ('int | None') – defaults to None

    Token groups with fewer vectors than this receive 1 centroid each. None (default) auto-derives as 2^round(log2(n_tokens^0.25)) clamped to [32, 128].

  • tac_small_threshold ('int | None') – defaults to None

    Token groups in [micro, small) receive 2 centroids each. None (default) auto-derives as 2 * tac_micro_threshold.

  • pq_sample_size ('int') – defaults to 10000000

    Maximum number of vectors sampled for PQ codebook training. Default: 10,000,000. Safe to lower on small corpora. May be increased on very large datasets (> 1B tokens).

  • pq_n_iter ('int') – defaults to 10

    K-means iterations for PQ codebook training. Default: 10. Same trade-off as tac_n_iter.

  • normalize ('bool') – defaults to True

    L2-normalise residuals before PQ encoding. Default: True. Leave as True unless you have a specific reason to disable.

  • pq_seed ('int') – defaults to 42

    Random seed for PQ codebook training. Default: 42. Fix for reproducibility; change to get a different codebook.

  • hnsw_m ('int') – defaults to 32

    HNSW graph degree (edges per node). Default: 32. Higher = better recall and more memory. Typical range: 16–64.

  • ef_construction ('int') – defaults to 1500

    HNSW build-time search width. Default: 1500. Purposefully high to maximise recall; up to 1–2M centroids the HNSW build time is negligible compared to TAC and PQ. For very large centroid counts (> 2M) reduce to keep build times reasonable. Hardly gives benefits above 1500.

  • k_centroids ('int') – defaults to 20

    Coarse centroids probed per query token at search time (n_probe in most IVF-based algorithms). Default: 20. Higher = more candidates, better recall, slower search.

  • k_docs_to_score ('int') – defaults to 500

    Candidate pool size for full late-interaction MaxSim scoring. Default: 500. Must be ≥ k. Alpha pruning may further reduce this pool before MaxSim. Increase for higher recall at the cost of latency.

  • ef_search ('int | None') – defaults to None

    HNSW search-time exploration width. None (default) resolves to round(1.5 × k_centroids) at search time, keeping the two coupled automatically. Set explicitly only to deviate from the 1.5× rule.

  • alpha ('float | None') – defaults to 0.45

    Coarse-score pruning threshold: candidates whose coarse score falls more than alpha × score[k] below the k-th best are dropped before MaxSim. Default: 0.45. Range [0, 1]; usually effective in [0, 0.5]. Smaller = more aggressive pruning = faster search but worse recall. None disables pruning (all k_docs_to_score candidates scored).

  • beta ('int | None') – defaults to None

    Early-termination patience: stop MaxSim scoring after this many consecutive non-improving documents. None = disabled (score all).

  • lambda_ ('float | None') – defaults to None

    HNSW early-exit parameter. Makes search faster; tune together with ef_search. None = disabled.

  • num_threads ('int') – defaults to 0

    Worker threads for batch_search. 0 = rayon default (all cores), 1 = single-threaded, n = custom pool of size n.

Methods

call

Search the index for the nearest documents to each query.

Parameters

  • queries_embeddings ('np.ndarray | torch.Tensor | list[np.ndarray] | list[torch.Tensor]')
  • k ('int') – defaults to 10
add_documents

Index a set of documents.

Parameters

  • documents_ids ('list[str]')
  • documents_embeddings ('list[np.ndarray | torch.Tensor]')
  • documents_token_ids ('list[np.ndarray] | None') – defaults to None
  • kwargs
get_documents_embeddings

Return approximate token embeddings for the requested documents.

Embeddings are reconstructed from stored PQ codes via approx = coarse_centroid + norm * PQ_residual and are therefore approximate (PQ lossy compression). When the index was built with center_dataset=True (the default), the dataset mean is added back so that the returned embeddings are in the original embedding space.

Parameters

  • documents_ids ('list[list[str]]')

Returns

list[list[np.ndarray]]: list[list[np.ndarray]]

remove_documents

References

If you use TACHIOM in your research, please cite::

@misc{martinico2026efficientmultivectorretrievaltokenaware,
      title={Efficient Multivector Retrieval with Token-Aware Clustering and Hierarchical Indexing},
      author={Silvio Martinico and Franco Maria Nardini and Cosimo Rulli and Rossano Venturini},
      year={2026},
      eprint={2604.28142},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2604.28142},
}