Skip to content

score_xtr

Score documents using XTR (eXact Token Retrieval) scoring.

XTR scoring differs from ColBERT in that it doesn't do full reranking. Instead, it only scores documents using initially retrieved tokens, and imputes missing token scores using min imputation (the minimum retrieved score per query token, as in the original XTR paper).

Parameters

  • query_doc_ids ('list[list[str]]')

    List of length q_tok, where each element is a list of k_token document IDs retrieved for that query token.

  • query_scores ('list[list[float]]')

    List of length q_tok, where each element is a list of k_token scores corresponding to the retrieved document IDs.

  • k ('int')

    Number of top documents to return.

  • device ('str') – defaults to cpu

    Device to use for computation ('cpu', 'cuda', etc.).

Examples

>>> from pylate.rank import score_xtr
>>> query_doc_ids = [
...     ["doc1", "doc2", "doc3"],  # Retrieved for query token 0
...     ["doc2", "doc3", "doc4"],  # Retrieved for query token 1
... ]
>>> query_scores = [
...     [0.9, 0.7, 0.5],  # Scores for query token 0
...     [0.8, 0.6, 0.4],  # Scores for query token 1
... ]
>>> results = score_xtr(query_doc_ids, query_scores, k=3)
>>> assert len(results) == 3
>>> assert results[0]["id"] == "doc2"  # Has high scores for both tokens

Notes

The XTR scoring algorithm: 1. For each document, sum scores across all query tokens 2. If a document's token wasn't retrieved for a query token, use the minimum retrieved score for that query token (min imputation) 3. If multiple tokens from the same document were retrieved for a query token, use the maximum score