langchain/tests
Jeff Vestal 0b542a9706
Add ElasticsearchEmbeddings class for generating embeddings using Elasticsearch models (#3401)
This PR introduces a new module, `elasticsearch_embeddings.py`, which
provides a wrapper around Elasticsearch embedding models. The new
ElasticsearchEmbeddings class allows users to generate embeddings for
documents and query texts using a [model deployed in an Elasticsearch
cluster](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-model-ref.html#ml-nlp-model-ref-text-embedding).

### Main features:

1. The ElasticsearchEmbeddings class initializes with an Elasticsearch
connection object and a model_id, providing an interface to interact
with the Elasticsearch ML client through
[infer_trained_model](https://elasticsearch-py.readthedocs.io/en/v8.7.0/api.html?highlight=trained%20model%20infer#elasticsearch.client.MlClient.infer_trained_model)
.
2. The `embed_documents()` method generates embeddings for a list of
documents, and the `embed_query()` method generates an embedding for a
single query text.
3. The class supports custom input text field names in case the deployed
model expects a different field name than the default `text_field`.
4. The implementation is compatible with any model deployed in
Elasticsearch that generates embeddings as output.

### Benefits:

1. Simplifies the process of generating embeddings using Elasticsearch
models.
2. Provides a clean and intuitive interface to interact with the
Elasticsearch ML client.
3. Allows users to easily integrate Elasticsearch-generated embeddings.

Related issue https://github.com/hwchase17/langchain/issues/3400

---------

Co-authored-by: Dev 2049 <dev.dev2049@gmail.com>
2023-05-23 14:50:33 -07:00
..
integration_tests Add ElasticsearchEmbeddings class for generating embeddings using Elasticsearch models (#3401) 2023-05-23 14:50:33 -07:00
mock_servers Add a mock server (#2443) 2023-04-05 10:35:46 -07:00
unit_tests Add AzureCognitiveServicesToolkit to call Azure Cognitive Services API (#5012) 2023-05-23 06:45:48 -07:00
__init__.py initial commit 2022-10-24 14:51:15 -07:00
data.py Add workflow for testing with all deps (#4410) 2023-05-10 09:35:07 -04:00
README.md feat: improve pinecone tests (#2806) 2023-04-13 21:49:31 -07:00

Readme tests(draft)

Integrations Tests

Prepare

This repository contains functional tests for several search engines and databases. The tests aim to verify the correct behavior of the engines and databases according to their specifications and requirements.

To run some integration tests, such as tests located in tests/integration_tests/vectorstores/, you will need to install the following software:

  • Docker
  • Python 3.8.1 or later

We have optional group test_integration in the pyproject.toml file. This group should contain dependencies for the integration tests and can be installed using the command:

poetry install --with test_integration

Any new dependencies should be added by running:

# add package and install it after adding:
poetry add tiktoken@latest --group "test_integration" && poetry install --with test_integration

Before running any tests, you should start a specific Docker container that has all the necessary dependencies installed. For instance, we use the elasticsearch.yml container for test_elasticsearch.py:

cd tests/integration_tests/vectorstores/docker-compose
docker-compose -f elasticsearch.yml up

Prepare environment variables for local testing:

  • copy tests/.env.example to tests/.env
  • set variables in tests/.env file, e.g OPENAI_API_KEY

Additionally, it's important to note that some integration tests may require certain environment variables to be set, such as OPENAI_API_KEY. Be sure to set any required environment variables before running the tests to ensure they run correctly.

Recording HTTP interactions with pytest-vcr

Some of the integration tests in this repository involve making HTTP requests to external services. To prevent these requests from being made every time the tests are run, we use pytest-vcr to record and replay HTTP interactions.

When running tests in a CI/CD pipeline, you may not want to modify the existing cassettes. You can use the --vcr-record=none command-line option to disable recording new cassettes. Here's an example:

pytest --log-cli-level=10 tests/integration_tests/vectorstores/test_pinecone.py --vcr-record=none
pytest tests/integration_tests/vectorstores/test_elasticsearch.py --vcr-record=none

Run some tests with coverage:

pytest tests/integration_tests/vectorstores/test_elasticsearch.py --cov=langchain --cov-report=html
start "" htmlcov/index.html || open htmlcov/index.html