PylateModelCardData¶
A dataclass for storing data used in the model card.
Parameters¶
-
language ('str | list[str] | None') – defaults to
<factory>
The model language, either a string or a list of strings, e.g., "en" or ["en", "de", "nl"].
-
license ('str | None') – defaults to
None
The license of the model, e.g., "apache-2.0", "mit", or "cc-by-nc-sa-4.0".
-
model_name ('str | None') – defaults to
None
The pretty name of the model, e.g., "SentenceTransformer based on microsoft/mpnet-base".
-
model_id ('str | None') – defaults to
None
The model ID for pushing the model to the Hub, e.g., "tomaarsen/sbert-mpnet-base-allnli".
-
train_datasets ('list[dict[str, str]]') – defaults to
<factory>
A list of dictionaries containing names and/or Hugging Face dataset IDs for training datasets, e.g., [{"name": "SNLI", "id": "stanfordnlp/snli"}, {"name": "MultiNLI", "id": "nyu-mll/multi_nli"}, {"name": "STSB"}].
-
eval_datasets ('list[dict[str, str]]') – defaults to
<factory>
A list of dictionaries containing names and/or Hugging Face dataset IDs for evaluation datasets, e.g., [{"name": "SNLI", "id": "stanfordnlp/snli"}, {"id": "mteb/stsbenchmark-sts"}].
-
task_name ('str') – defaults to
semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more
The human-readable task the model is trained on, e.g., "semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more".
-
tags ('list[str] | None') – defaults to
<factory>
A list of tags for the model, e.g., ["sentence-transformers", "sentence-similarity", "feature-extraction"].
-
generate_widget_examples ("Literal['deprecated']") – defaults to
deprecated
Attributes¶
-
base_model
-
base_model_revision
-
best_model_step
-
code_carbon_callback
-
license
-
model
-
model_id
-
model_name
-
predict_example
-
trainer
Methods¶
add_tags
compute_dataset_metrics
Given a dataset, compute the following: * Dataset Size * Dataset Columns * Dataset Stats - Strings: min, mean, max word count/token length - Integers: Counter() instance - Floats: min, mean, max range - List: number of elements or min, mean, max number of elements * 3 Example samples * Loss function name - Loss function config
Parameters
- dataset ('Dataset | IterableDataset | None')
- dataset_info ('dict[str, Any]')
- loss ('dict[str, nn.Module] | nn.Module | None')
extract_dataset_metadata
format_eval_metrics
Format the evaluation metrics for the model card.
The following keys will be returned: - eval_metrics: A list of dictionaries containing the class name, description, dataset name, and a markdown table This is used to display the evaluation metrics in the model card. - metrics: A list of all metric keys. This is used in the model card metadata. - model-index: A list of dictionaries containing the task name, task type, dataset type, dataset name, metric name, metric type, and metric value. This is used to display the evaluation metrics in the model card metadata.
format_training_logs
get
Get value for a given metadata key.
Parameters
- key (str)
- default (Any) – defaults to
None
get_codecarbon_data
infer_datasets
pop
Pop value for a given metadata key.
Parameters
- key (str)
- default (Any) – defaults to
None
register_model
set_base_model
set_best_model_step
set_evaluation_metrics
set_label_examples
set_language
set_license
set_losses
set_model_id
set_widget_examples
to_dict
Converts CardData to a dict.
Returns: dict
: CardData represented as a dictionary ready to be dumped to a YAML block for inclusion in a README.md file.
to_yaml
Dumps CardData to a YAML block for inclusion in a README.md file.
Args: line_break (str, optional): The line break to use when dumping to yaml. Returns: str
: CardData represented as a YAML block.
Parameters
- line_break – defaults to
None