insert_documents¶
Insert documents into the documents table with optional multi-threading.
Parameters¶
-
database (str)
The name of the DuckDB database.
-
schema (str)
The schema in which the documents table is located.
-
df (list[dict] | str)
The list of document dictionaries or a string (URL) for a Hugging Face dataset to insert.
-
key (str)
The field that uniquely identifies each document (e.g., 'id').
-
columns (list[str] | str)
The list of document fields to insert. Can be a string if inserting a single field.
-
dtypes (dict[str, str] | None) – defaults to
None
Optional dictionary specifying the DuckDB type for each field. Defaults to 'VARCHAR' for all unspecified fields.
-
batch_size (int) – defaults to
30000
The number of documents to insert in each batch.
-
n_jobs (int) – defaults to
-1
Number of parallel jobs to use for inserting documents. Default use all available processors.
-
config (dict | None) – defaults to
None
Optional configuration options for the DuckDB connection.
-
limit (int | None) – defaults to
None
Examples¶
>>> from ducksearch import tables
>>> df = [
... {"id": 1, "title": "title document 1", "text": "text document 1"},
... {"id": 2, "title": "title document 2", "text": "text document 2"},
... {"id": 3, "title": "title document 3", "text": "text document 3"},
... ]
>>> _ = tables.insert_documents(
... database="test.duckdb",
... schema="bm25_tables",
... key="id",
... columns=["title", "text"],
... df=df
... )