NanoBEIREvaluator¶
Evaluate the performance of a PyLate Model on the NanoBEIR collection.
This is a direct extension of the NanoBEIREvaluator from the sentence-transformers library, leveraging the PyLateInformationRetrievalEvaluator class. The collection is a set of datasets based on the BEIR collection, but with a significantly smaller size, so it can be used for quickly evaluating the retrieval performance of a model before commiting to a full evaluation. The Evaluator will return the same metrics as the InformationRetrievalEvaluator (i.e., MRR, nDCG, Recall@k), for each dataset and on average.
Parameters¶
-
dataset_names ('list[DatasetNameType] | None') – defaults to
None
-
mrr_at_k ('list[int]') – defaults to
[10]
-
ndcg_at_k ('list[int]') – defaults to
[10]
-
accuracy_at_k ('list[int]') – defaults to
[1, 3, 5, 10]
-
precision_recall_at_k ('list[int]') – defaults to
[1, 3, 5, 10]
-
map_at_k ('list[int]') – defaults to
[100]
-
show_progress_bar ('bool') – defaults to
False
-
batch_size ('int') – defaults to
32
-
write_csv ('bool') – defaults to
True
-
truncate_dim ('int | None') – defaults to
None
-
score_functions ('dict[str, Callable[[Tensor, Tensor], Tensor]]') – defaults to
None
-
main_score_function ('str | SimilarityFunction | None') – defaults to
None
-
aggregate_fn ('Callable[[list[float]], float]') – defaults to
<function mean at 0x10669dbc0>
-
aggregate_key ('str') – defaults to
mean
-
query_prompts ('str | dict[str, str] | None') – defaults to
None
-
corpus_prompts ('str | dict[str, str] | None') – defaults to
None
Attributes¶
-
description
Returns a human-readable description of the evaluator: BinaryClassificationEvaluator -> Binary Classification 1. Replace "CE" prefix with "CrossEncoder" 2. Remove "Evaluator" from the class name 3. Add a space before every capital letter
Examples¶
>>> from pylate import models, evaluation
>>> model = models.ColBERT(
... model_name_or_path="lightonai/colbertv2.0"
... )
>>> datasets = ["SciFact"]
>>> try:
... evaluator = evaluation.NanoBEIREvaluator(
... dataset_names=datasets
... )
... results = evaluator(model)
... except Exception:
... pass
{'NanoSciFact_MaxSim_accuracy@1': 0.62, 'NanoSciFact_MaxSim_accuracy@3': 0.74, 'NanoSciFact_MaxSim_accuracy@5': 0.8, 'NanoSciFact_MaxSim_accuracy@10': 0.86, 'NanoSciFact_MaxSim_precision@1': 0.62, 'NanoSciFact_MaxSim_precision@3': 0.26666666666666666, 'NanoSciFact_MaxSim_precision@5': 0.18, 'NanoSciFact_MaxSim_precision@10': 0.096, 'NanoSciFact_MaxSim_recall@1': 0.595, 'NanoSciFact_MaxSim_recall@3': 0.715, 'NanoSciFact_MaxSim_recall@5': 0.79, 'NanoSciFact_MaxSim_recall@10': 0.85, 'NanoSciFact_MaxSim_ndcg@10': 0.7279903941189909, 'NanoSciFact_MaxSim_mrr@10': 0.6912222222222222, 'NanoSciFact_MaxSim_map@100': 0.6903374780806633, 'NanoBEIR_mean_MaxSim_accuracy@1': 0.62, 'NanoBEIR_mean_MaxSim_accuracy@3': 0.74, 'NanoBEIR_mean_MaxSim_accuracy@5': 0.8, 'NanoBEIR_mean_MaxSim_accuracy@10': 0.86, 'NanoBEIR_mean_MaxSim_precision@1': 0.62, 'NanoBEIR_mean_MaxSim_precision@3': 0.26666666666666666, 'NanoBEIR_mean_MaxSim_precision@5': 0.18, 'NanoBEIR_mean_MaxSim_precision@10': 0.096, 'NanoBEIR_mean_MaxSim_recall@1': 0.595, 'NanoBEIR_mean_MaxSim_recall@3': 0.715, 'NanoBEIR_mean_MaxSim_recall@5': 0.79, 'NanoBEIR_mean_MaxSim_recall@10': 0.85, 'NanoBEIR_mean_MaxSim_ndcg@10': 0.7279903941189909, 'NanoBEIR_mean_MaxSim_mrr@10': 0.6912222222222222, 'NanoBEIR_mean_MaxSim_map@100': 0.6903374780806633}
Methods¶
call
This is called during training to evaluate the model. It returns a score for the evaluation with a higher score indicating a better result.
Args: model: the model to evaluate output_path: path where predictions and metrics are written to epoch: the epoch where the evaluation takes place. This is used for the file prefixes. If this is -1, then we assume evaluation on test data. steps: the steps in the current epoch at time of the evaluation. This is used for the file prefixes. If this is -1, then we assume evaluation at the end of the epoch. Returns: Either a score for the evaluation with a higher score indicating a better result, or a dictionary with scores. If the latter is chosen, then evaluator.primary_metric
must be defined
Parameters
- model ('SentenceTransformer')
- output_path ('str') – defaults to
None
- epoch ('int') – defaults to
-1
- steps ('int') – defaults to
-1
- args
- kwargs
get_config_dict
Return a dictionary with all meaningful configuration values of the evaluator to store in the model card.