Skip to content

Available models

Tip

Following an update, all the models trained using the stanford-nlp ColBERT library or RAGatouille should be compatible with PyLate natively (including their configurations). You can simply load the model in PyLate:

from pylate import models

model = models.ColBERT(
    model_name_or_path="colbert-ir/colbertv2.0",
)
or
model = models.ColBERT(
    model_name_or_path="jinaai/jina-colbert-v2",
    trust_remote_code=True,
)

Here is a list of some of the pre-trained ColBERT models available in PyLate along with their results on BEIR:

Model BEIR AVG NFCorpus SciFact SCIDOCS FiQA2018 TRECCOVID HotpotQA Touche2020 ArguAna ClimateFEVER FEVER QuoraRetrieval NQ DBPedia
lightonai/colbertv2.0 50.02 33.8 69.3 15.4 35.6 73.3 66.7 26.3 46.3 17.6 78.5 85.2 56.2 44.6
answerdotai/answerai-colbert-small-v1 53.79 37.3 74.77 18.42 41.15 84.59 76.11 25.69 50.09 33.07 90.96 87.72 59.1 45.58
jinaai/jina-colbert-v2 53.1 34.6 67.8 18.6 40.8 83.4 76.6 27.4 36.6 23.9 80.05 88.7 64.0 47.1
GTE-ModernColBERT-v1 54.89 37.93 76.34 19.06 48.51 83.59 77.32 31.23 48.51 30.62 87.44 86.61 61.8 48.3
Note

lightonai/colbertv2.0 is the original ColBERTv2 model made compatible with PyLate before we supported loading directly model from Stanford-NLP. We thank Omar Khattab for allowing us to share the model on PyLate.

Defining dense layers

By default, if you use a base model to create a PyLate model, it'll add a dense layer projecting the output dimension of the model to embedding_size. If you did not specify any embedding_size, it'll default to 128.

model = models.ColBERT("bert-base-uncased")

If you create a PyLate model from a sentence-transformers model, it'll load the dense layer of this model and only add another one if you specified an embedding_size and it is not matching the size of the last dense layer of the ST model.

If you do not want to use the dense layers of the ST model (but still want to use its base weights), you should use the modular syntax:

import torch
from sentence_transformers.models import Transformer
from pylate import models

base_model = Transformer("answerdotai/ModernBERT-base")

dense_1 = models.Dense(
    in_features=768,
    out_features=512,
    bias=False,
    activation_function=torch.nn.GELU(),
)
dense_2 = models.Dense(
    in_features=512,
    out_features=128,
    bias=False,
    activation_function=torch.nn.Identity(),
)

model = models.ColBERT(
    modules=[base_model, dense_1, dense_2],
    document_length=300,
    query_length=32,
)

ColBERT(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Dense({'in_features': 768, 'out_features': 512, 'bias': False, 'activation_function': 'torch.nn.modules.activation.GELU', 'use_residual': False})
  (2): Dense({'in_features': 512, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'use_residual': False})
)

It also allows you to define the activation function and use multiple dense layers. Please note that you can also append layers to existing models as well as remove them, so you can really create the modules you want

import torch
from pylate import models
model = models.ColBERT("google/embeddinggemma-300m")
ColBERT(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'use_residual': False})
  (2): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'use_residual': False})
)

dense_1 = models.Dense(
    in_features=768,
    out_features=128,
    bias=False,
    activation_function=torch.nn.Identity(),
    use_residual=False,
)

model.append(dense_1)
ColBERT(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'use_residual': False})
  (2): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'use_residual': False})
  (3): Dense({'in_features': 768, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'use_residual': False})
)

del model[3]
ColBERT(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'use_residual': False})
  (2): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'use_residual': False})
)

Tip

MixedBread study showed that it is beneficial to use MLPs to do the projection rather than a simple dense layer. The study explores different depths, activation functions and the use of residual layers. Please check the paper for a more thorough analysis.

import torch
from sentence_transformers.models import Transformer
from pylate import models

base_model = Transformer("jhu-clsp/ettin-encoder-32m")

dense_1 = models.Dense(
    in_features=384,
    out_features=768,
    bias=False,
    activation_function=torch.nn.Identity(),
    use_residual=True,
)
dense_2 = models.Dense(
    in_features=768,
    out_features=384,
    bias=False,
    activation_function=torch.nn.Identity(),
    use_residual=False,
)

model = models.ColBERT(
    modules=[base_model, dense_1, dense_2],
)