Skip to content

PylateModelCardData

A dataclass for storing data used in the model card.

Parameters

  • language ('str | list[str] | None') – defaults to <factory>

    The model language, either a string or a list of strings, e.g., "en" or ["en", "de", "nl"].

  • license ('str | None') – defaults to None

    The license of the model, e.g., "apache-2.0", "mit", or "cc-by-nc-sa-4.0".

  • model_name ('str | None') – defaults to None

    The pretty name of the model, e.g., "SentenceTransformer based on microsoft/mpnet-base".

  • model_id ('str | None') – defaults to None

    The model ID for pushing the model to the Hub, e.g., "tomaarsen/sbert-mpnet-base-allnli".

  • train_datasets ('list[dict[str, str]]') – defaults to <factory>

    A list of dictionaries containing names and/or Hugging Face dataset IDs for training datasets, e.g., [{"name": "SNLI", "id": "stanfordnlp/snli"}, {"name": "MultiNLI", "id": "nyu-mll/multi_nli"}, {"name": "STSB"}].

  • eval_datasets ('list[dict[str, str]]') – defaults to <factory>

    A list of dictionaries containing names and/or Hugging Face dataset IDs for evaluation datasets, e.g., [{"name": "SNLI", "id": "stanfordnlp/snli"}, {"id": "mteb/stsbenchmark-sts"}].

  • task_name ('str') – defaults to semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more

    The human-readable task the model is trained on, e.g., "semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more".

  • tags ('list[str] | None') – defaults to <factory>

    A list of tags for the model, e.g., ["sentence-transformers", "sentence-similarity", "feature-extraction"].

  • generate_widget_examples ("Literal['deprecated']") – defaults to deprecated

Attributes

  • base_model

  • base_model_revision

  • best_model_step

  • code_carbon_callback

  • license

  • model

  • model_id

  • model_name

  • predict_example

  • trainer

Methods

add_tags
compute_dataset_metrics

Given a dataset, compute the following: * Dataset Size * Dataset Columns * Dataset Stats - Strings: min, mean, max word count/token length - Integers: Counter() instance - Floats: min, mean, max range - List: number of elements or min, mean, max number of elements * 3 Example samples * Loss function name - Loss function config

Parameters

  • dataset ('Dataset | IterableDataset | None')
  • dataset_info ('dict[str, Any]')
  • loss ('dict[str, nn.Module] | nn.Module | None')
extract_dataset_metadata
format_eval_metrics

Format the evaluation metrics for the model card.

The following keys will be returned: - eval_metrics: A list of dictionaries containing the class name, description, dataset name, and a markdown table This is used to display the evaluation metrics in the model card. - metrics: A list of all metric keys. This is used in the model card metadata. - model-index: A list of dictionaries containing the task name, task type, dataset type, dataset name, metric name, metric type, and metric value. This is used to display the evaluation metrics in the model card metadata.

format_training_logs
get

Get value for a given metadata key.

Parameters

  • key (str)
  • default (Any) – defaults to None
get_codecarbon_data
infer_datasets
pop

Pop value for a given metadata key.

Parameters

  • key (str)
  • default (Any) – defaults to None
register_model
set_base_model
set_best_model_step
set_evaluation_metrics
set_label_examples
set_language
set_license
set_losses
set_model_id
set_widget_examples
to_dict

Converts CardData to a dict.

Returns: dict: CardData represented as a dictionary ready to be dumped to a YAML block for inclusion in a README.md file.

to_yaml

Dumps CardData to a YAML block for inclusion in a README.md file.

Args: line_break (str, optional): The line break to use when dumping to yaml. Returns: str: CardData represented as a YAML block.

Parameters

  • line_break – defaults to None
try_to_set_base_model
validate_datasets