langchain/tests/integration_tests/embeddings/test_elasticsearch.py
Jeff Vestal 0b542a9706
Add ElasticsearchEmbeddings class for generating embeddings using Elasticsearch models (#3401)
This PR introduces a new module, `elasticsearch_embeddings.py`, which
provides a wrapper around Elasticsearch embedding models. The new
ElasticsearchEmbeddings class allows users to generate embeddings for
documents and query texts using a [model deployed in an Elasticsearch
cluster](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-model-ref.html#ml-nlp-model-ref-text-embedding).

### Main features:

1. The ElasticsearchEmbeddings class initializes with an Elasticsearch
connection object and a model_id, providing an interface to interact
with the Elasticsearch ML client through
[infer_trained_model](https://elasticsearch-py.readthedocs.io/en/v8.7.0/api.html?highlight=trained%20model%20infer#elasticsearch.client.MlClient.infer_trained_model)
.
2. The `embed_documents()` method generates embeddings for a list of
documents, and the `embed_query()` method generates an embedding for a
single query text.
3. The class supports custom input text field names in case the deployed
model expects a different field name than the default `text_field`.
4. The implementation is compatible with any model deployed in
Elasticsearch that generates embeddings as output.

### Benefits:

1. Simplifies the process of generating embeddings using Elasticsearch
models.
2. Provides a clean and intuitive interface to interact with the
Elasticsearch ML client.
3. Allows users to easily integrate Elasticsearch-generated embeddings.

Related issue https://github.com/hwchase17/langchain/issues/3400

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-23 14:50:33 -07:00

31 lines
1.1 KiB
Python

"""Test elasticsearch_embeddings embeddings."""
import pytest
from langchain.embeddings.elasticsearch import ElasticsearchEmbeddings
@pytest.fixture
def model_id() -> str:
# Replace with your actual model_id
return "your_model_id"
def test_elasticsearch_embedding_documents(model_id: str) -> None:
"""Test Elasticsearch embedding documents."""
documents = ["foo bar", "bar foo", "foo"]
embedding = ElasticsearchEmbeddings.from_credentials(model_id)
output = embedding.embed_documents(documents)
assert len(output) == 3
assert len(output[0]) == 768 # Change 768 to the expected embedding size
assert len(output[1]) == 768 # Change 768 to the expected embedding size
assert len(output[2]) == 768 # Change 768 to the expected embedding size
def test_elasticsearch_embedding_query(model_id: str) -> None:
"""Test Elasticsearch embedding query."""
document = "foo bar"
embedding = ElasticsearchEmbeddings.from_credentials(model_id)
output = embedding.embed_query(document)
assert len(output) == 768 # Change 768 to the expected embedding size