Skip to content

insert_documents

Insert documents into the documents table with optional multi-threading.

Parameters

  • database (str)

    The name of the DuckDB database.

  • schema (str)

    The schema in which the documents table is located.

  • df (list[dict] | str)

    The list of document dictionaries or a string (URL) for a Hugging Face dataset to insert.

  • key (str)

    The field that uniquely identifies each document (e.g., 'id').

  • columns (list[str] | str)

    The list of document fields to insert. Can be a string if inserting a single field.

  • dtypes (dict[str, str] | None) – defaults to None

    Optional dictionary specifying the DuckDB type for each field. Defaults to 'VARCHAR' for all unspecified fields.

  • batch_size (int) – defaults to 30000

    The number of documents to insert in each batch.

  • n_jobs (int) – defaults to -1

    Number of parallel jobs to use for inserting documents. Default use all available processors.

  • config (dict | None) – defaults to None

    Optional configuration options for the DuckDB connection.

  • limit (int | None) – defaults to None

Examples

>>> from ducksearch import tables

>>> df = [
...     {"id": 1, "title": "title document 1", "text": "text document 1"},
...     {"id": 2, "title": "title document 2", "text": "text document 2"},
...     {"id": 3, "title": "title document 3", "text": "text document 3"},
... ]

>>> _ = tables.insert_documents(
...     database="test.duckdb",
...     schema="bm25_tables",
...     key="id",
...     columns=["title", "text"],
...     df=df
... )