langchain/libs/community/tests/unit_tests/embeddings/test_yandex.py
Egor Krasheninnikov c8391d4ff1
community[patch]: Fix YandexGPT embeddings (#19720)
Fix of YandexGPT embeddings. 

The current version uses a single `model_name` for queries and
documents, essentially making the `embed_documents` and `embed_query`
methods the same. Yandex has a different endpoint (`model_uri`) for
encoding documents, see
[this](https://yandex.cloud/en/docs/yandexgpt/concepts/embeddings). The
bug may impact retrievers built with `YandexGPTEmbeddings` (for instance
FAISS database as retriever) since they use both `embed_documents` and
`embed_query`.

A simple snippet to test the behaviour:
```python
from langchain_community.embeddings.yandex import YandexGPTEmbeddings
embeddings = YandexGPTEmbeddings()
q_emb = embeddings.embed_query('hello world')
doc_emb = embeddings.embed_documents(['hello world', 'hello world'])
q_emb == doc_emb[0]
```
The response is `True` with the current version and `False` with the
changes I made.


Twitter: @egor_krash

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-13 16:23:01 -07:00

25 lines
844 B
Python

import os
from langchain_community.embeddings import YandexGPTEmbeddings
def test_init() -> None:
os.environ["YC_API_KEY"] = "foo"
models = [
YandexGPTEmbeddings(folder_id="bar"),
YandexGPTEmbeddings(
query_model_uri="emb://bar/text-search-query/latest",
doc_model_uri="emb://bar/text-search-doc/latest",
),
YandexGPTEmbeddings(
folder_id="bar",
query_model_name="text-search-query",
doc_model_name="text-search-doc",
),
]
for embeddings in models:
assert embeddings.model_uri == "emb://bar/text-search-query/latest"
assert embeddings.doc_model_uri == "emb://bar/text-search-doc/latest"
assert embeddings.model_name == "text-search-query"
assert embeddings.doc_model_name == "text-search-doc"