This enables bulk args like `chunk_size` to be passed down from the
ingest methods (from_text, from_documents) to be passed down to the bulk
API.
This helps alleviate issues where bulk importing a large amount of
documents into Elasticsearch was resulting in a timeout.
Contribution Shoutout
- @elastic
- [x] Updated Integration tests
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
@ -18,7 +18,7 @@ Example: Run a single-node Elasticsearch instance with security disabled. This i
#### Deploy Elasticsearch on Elastic Cloud
Elastic Cloud is a managed Elasticsearch service. Signup for a [free trial](https://cloud.elastic.co/registration?storm=langchain-notebook).
Elastic Cloud is a managed Elasticsearch service. Signup for a [free trial](https://cloud.elastic.co/registration?utm_source=langchain&utm_content=documentation).
"There are two main ways to setup an Elasticsearch instance for use with:\n",
"\n",
"1. Elastic Cloud: Elastic Cloud is a managed Elasticsearch service. Signup for a [free trial](https://cloud.elastic.co/registration?storm=langchain-notebook).\n",
"1. Elastic Cloud: Elastic Cloud is a managed Elasticsearch service. Signup for a [free trial](https://cloud.elastic.co/registration?utm_source=langchain&utm_content=documentation).\n",
"\n",
"To connect to an Elasticsearch instance that does not require\n",
"login credentials (starting the docker instance with security enabled), pass the Elasticsearch URL and index name along with the\n",
@ -662,7 +662,7 @@
"id": "0960fa0a",
"metadata": {},
"source": [
"# Customise the Query\n",
"## Customise the Query\n",
"With `custom_query` parameter at search, you are able to adjust the query that is used to retrieve documents from Elasticsearch. This is useful if you want to want to use a more complex query, to support linear boosting of fields."
]
},
@ -720,6 +720,35 @@
"print(results[0])"
]
},
{
"cell_type": "markdown",
"id": "3242fd42",
"metadata": {},
"source": [
"# FAQ\n",
"\n",
"## Question: Im getting timeout errors when indexing documents into Elasticsearch. How do I fix this?\n",
"One possible issue is your documents might take longer to index into Elasticsearch. ElasticsearchStore uses the Elasticsearch bulk API which has a few defaults that you can adjust to reduce the chance of timeout errors.\n",
"\n",
"This is also a good idea when you're using SparseVectorRetrievalStrategy.\n",
"\n",
"The defaults are:\n",
"- `chunk_size`: 500\n",
"- `max_chunk_bytes`: 100MB\n",
"\n",
"To adjust these, you can pass in the `chunk_size` and `max_chunk_bytes` parameters to the ElasticsearchStore `add_texts` method.\n",