langchain/tests/integration_tests/embeddings/test_elasticsearch.py

31 lines
1.1 KiB
Python
Raw Normal View History

Add ElasticsearchEmbeddings class for generating embeddings using Elasticsearch models (#3401) This PR introduces a new module, `elasticsearch_embeddings.py`, which provides a wrapper around Elasticsearch embedding models. The new ElasticsearchEmbeddings class allows users to generate embeddings for documents and query texts using a [model deployed in an Elasticsearch cluster](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-model-ref.html#ml-nlp-model-ref-text-embedding). ### Main features: 1. The ElasticsearchEmbeddings class initializes with an Elasticsearch connection object and a model_id, providing an interface to interact with the Elasticsearch ML client through [infer_trained_model](https://elasticsearch-py.readthedocs.io/en/v8.7.0/api.html?highlight=trained%20model%20infer#elasticsearch.client.MlClient.infer_trained_model) . 2. The `embed_documents()` method generates embeddings for a list of documents, and the `embed_query()` method generates an embedding for a single query text. 3. The class supports custom input text field names in case the deployed model expects a different field name than the default `text_field`. 4. The implementation is compatible with any model deployed in Elasticsearch that generates embeddings as output. ### Benefits: 1. Simplifies the process of generating embeddings using Elasticsearch models. 2. Provides a clean and intuitive interface to interact with the Elasticsearch ML client. 3. Allows users to easily integrate Elasticsearch-generated embeddings. Related issue https://github.com/hwchase17/langchain/issues/3400 --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-23 21:50:33 +00:00
"""Test elasticsearch_embeddings embeddings."""
import pytest
from langchain.embeddings.elasticsearch import ElasticsearchEmbeddings
@pytest.fixture
def model_id() -> str:
# Replace with your actual model_id
return "your_model_id"
def test_elasticsearch_embedding_documents(model_id: str) -> None:
"""Test Elasticsearch embedding documents."""
documents = ["foo bar", "bar foo", "foo"]
embedding = ElasticsearchEmbeddings.from_credentials(model_id)
output = embedding.embed_documents(documents)
assert len(output) == 3
assert len(output[0]) == 768 # Change 768 to the expected embedding size
assert len(output[1]) == 768 # Change 768 to the expected embedding size
assert len(output[2]) == 768 # Change 768 to the expected embedding size
def test_elasticsearch_embedding_query(model_id: str) -> None:
"""Test Elasticsearch embedding query."""
document = "foo bar"
embedding = ElasticsearchEmbeddings.from_credentials(model_id)
output = embedding.embed_query(document)
assert len(output) == 768 # Change 768 to the expected embedding size