Skip to content

queries

Upload queries to DuckDB, map documents to queries, and index using BM25.

Parameters

  • database (str)

    Name of the DuckDB database.

  • queries (list[str] | None) – defaults to None

    List of queries to upload. Each query is a string.

  • documents_queries (dict[list]) – defaults to None

    Dictionary mapping document IDs to a list of queries.

  • k1 (float) – defaults to 1.5

    BM25 k1 parameter, controls term saturation.

  • b (float) – defaults to 0.75

    BM25 b parameter, controls document length normalization.

  • stemmer (str) – defaults to porter

    Stemming algorithm to use (e.g., 'porter'). The type of stemmer to be used. One of 'arabic', 'basque', 'catalan', 'danish', 'dutch', 'english', 'finnish', 'french', 'german', 'greek', 'hindi', 'hungarian', 'indonesian', 'irish', 'italian', 'lithuanian', 'nepali', 'norwegian', 'porter', 'portuguese', 'romanian', 'russian', 'serbian', 'spanish', 'swedish', 'tamil', 'turkish', or 'none' if no stemming is to be used.

  • stopwords (str | list[str]) – defaults to None

    List of stopwords to exclude from indexing. Default can be a custom list or a language string.

  • ignore (str) – defaults to (\.|[^a-z])+

    Regular expression pattern to ignore characters when indexing. Default ignore punctuation and non-alphabetic characters.

  • strip_accents (bool) – defaults to True

    Whether to remove accents from characters during indexing.

  • lower (bool) – defaults to True

  • batch_size (int) – defaults to 30000

    Number of queries to process per batch.

  • config (dict | None) – defaults to None

    Optional configuration dictionary for the DuckDB connection and other settings.