mirror of
https://github.com/hwchase17/langchain
synced 2024-11-10 01:10:59 +00:00
c8391d4ff1
Fix of YandexGPT embeddings. The current version uses a single `model_name` for queries and documents, essentially making the `embed_documents` and `embed_query` methods the same. Yandex has a different endpoint (`model_uri`) for encoding documents, see [this](https://yandex.cloud/en/docs/yandexgpt/concepts/embeddings). The bug may impact retrievers built with `YandexGPTEmbeddings` (for instance FAISS database as retriever) since they use both `embed_documents` and `embed_query`. A simple snippet to test the behaviour: ```python from langchain_community.embeddings.yandex import YandexGPTEmbeddings embeddings = YandexGPTEmbeddings() q_emb = embeddings.embed_query('hello world') doc_emb = embeddings.embed_documents(['hello world', 'hello world']) q_emb == doc_emb[0] ``` The response is `True` with the current version and `False` with the changes I made. Twitter: @egor_krash --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com>
25 lines
844 B
Python
25 lines
844 B
Python
import os
|
|
|
|
from langchain_community.embeddings import YandexGPTEmbeddings
|
|
|
|
|
|
def test_init() -> None:
|
|
os.environ["YC_API_KEY"] = "foo"
|
|
models = [
|
|
YandexGPTEmbeddings(folder_id="bar"),
|
|
YandexGPTEmbeddings(
|
|
query_model_uri="emb://bar/text-search-query/latest",
|
|
doc_model_uri="emb://bar/text-search-doc/latest",
|
|
),
|
|
YandexGPTEmbeddings(
|
|
folder_id="bar",
|
|
query_model_name="text-search-query",
|
|
doc_model_name="text-search-doc",
|
|
),
|
|
]
|
|
for embeddings in models:
|
|
assert embeddings.model_uri == "emb://bar/text-search-query/latest"
|
|
assert embeddings.doc_model_uri == "emb://bar/text-search-doc/latest"
|
|
assert embeddings.model_name == "text-search-query"
|
|
assert embeddings.doc_model_name == "text-search-doc"
|