mirror of
https://github.com/hwchase17/langchain
synced 2024-10-31 15:20:26 +00:00
0b542a9706
This PR introduces a new module, `elasticsearch_embeddings.py`, which provides a wrapper around Elasticsearch embedding models. The new ElasticsearchEmbeddings class allows users to generate embeddings for documents and query texts using a [model deployed in an Elasticsearch cluster](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-model-ref.html#ml-nlp-model-ref-text-embedding). ### Main features: 1. The ElasticsearchEmbeddings class initializes with an Elasticsearch connection object and a model_id, providing an interface to interact with the Elasticsearch ML client through [infer_trained_model](https://elasticsearch-py.readthedocs.io/en/v8.7.0/api.html?highlight=trained%20model%20infer#elasticsearch.client.MlClient.infer_trained_model) . 2. The `embed_documents()` method generates embeddings for a list of documents, and the `embed_query()` method generates an embedding for a single query text. 3. The class supports custom input text field names in case the deployed model expects a different field name than the default `text_field`. 4. The implementation is compatible with any model deployed in Elasticsearch that generates embeddings as output. ### Benefits: 1. Simplifies the process of generating embeddings using Elasticsearch models. 2. Provides a clean and intuitive interface to interact with the Elasticsearch ML client. 3. Allows users to easily integrate Elasticsearch-generated embeddings. Related issue https://github.com/hwchase17/langchain/issues/3400 --------- Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
31 lines
1.1 KiB
Python
31 lines
1.1 KiB
Python
"""Test elasticsearch_embeddings embeddings."""
|
|
|
|
import pytest
|
|
|
|
from langchain.embeddings.elasticsearch import ElasticsearchEmbeddings
|
|
|
|
|
|
@pytest.fixture
|
|
def model_id() -> str:
|
|
# Replace with your actual model_id
|
|
return "your_model_id"
|
|
|
|
|
|
def test_elasticsearch_embedding_documents(model_id: str) -> None:
|
|
"""Test Elasticsearch embedding documents."""
|
|
documents = ["foo bar", "bar foo", "foo"]
|
|
embedding = ElasticsearchEmbeddings.from_credentials(model_id)
|
|
output = embedding.embed_documents(documents)
|
|
assert len(output) == 3
|
|
assert len(output[0]) == 768 # Change 768 to the expected embedding size
|
|
assert len(output[1]) == 768 # Change 768 to the expected embedding size
|
|
assert len(output[2]) == 768 # Change 768 to the expected embedding size
|
|
|
|
|
|
def test_elasticsearch_embedding_query(model_id: str) -> None:
|
|
"""Test Elasticsearch embedding query."""
|
|
document = "foo bar"
|
|
embedding = ElasticsearchEmbeddings.from_credentials(model_id)
|
|
output = embedding.embed_query(document)
|
|
assert len(output) == 768 # Change 768 to the expected embedding size
|