insert_documents¶
Insert documents into the documents table with optional multi-threading.
Parameters¶
- 
database (str)
The name of the DuckDB database.
 - 
schema (str)
The schema in which the documents table is located.
 - 
df (list[dict] | str)
The list of document dictionaries or a string (URL) for a Hugging Face dataset to insert.
 - 
key (str)
The field that uniquely identifies each document (e.g., 'id').
 - 
columns (list[str] | str)
The list of document fields to insert. Can be a string if inserting a single field.
 - 
dtypes (dict[str, str] | None) – defaults to
NoneOptional dictionary specifying the DuckDB type for each field. Defaults to 'VARCHAR' for all unspecified fields.
 - 
batch_size (int) – defaults to
30000The number of documents to insert in each batch.
 - 
n_jobs (int) – defaults to
-1Number of parallel jobs to use for inserting documents. Default use all available processors.
 - 
config (dict | None) – defaults to
NoneOptional configuration options for the DuckDB connection.
 - 
limit (int | None) – defaults to
None 
Examples¶
>>> from ducksearch import tables
>>> df = [
...     {"id": 1, "title": "title document 1", "text": "text document 1"},
...     {"id": 2, "title": "title document 2", "text": "text document 2"},
...     {"id": 3, "title": "title document 3", "text": "text document 3"},
... ]
>>> _ = tables.insert_documents(
...     database="test.duckdb",
...     schema="bm25_tables",
...     key="id",
...     columns=["title", "text"],
...     df=df
... )