Skip to content

insert_documents

Insert documents from a Hugging Face dataset into DuckDB.

Parameters

  • database (str)

    The name of the DuckDB database.

  • schema (str)

    The schema in which the documents table is located.

  • key (str)

    The key field that uniquely identifies each document (e.g., 'query_id').

  • url (str)

    The URL of the Hugging Face dataset in Parquet format.

  • config (dict | None) – defaults to None

    Optional configuration options for the DuckDB connection.

  • limit (int | None) – defaults to None

  • dtypes (dict | None) – defaults to None

Examples

>>> from ducksearch import upload

>>> upload.documents(
...     database="test.duckdb",
...     documents="hf://datasets/lightonai/lighton-ms-marco-mini/queries.parquet",
...     key="query_id",
...     fields=["query_id", "text"],
... )
| Table          | Size |
|----------------|------|
| documents      | 19   |
| bm25_documents | 19   |

>>> upload.documents(
...     database="test.duckdb",
...     documents="hf://datasets/lightonai/lighton-ms-marco-mini/documents.parquet",
...     key="document_id",
...     fields=["document_id", "text"],
... )
| Table          | Size |
|----------------|------|
| documents      | 51   |
| bm25_documents | 51   |