Fix minor bug in opensearch vector store add_texts function (#1878)

In the langchain.vectorstores.opensearch_vector_search.py, in the
add_texts function, around line 247, we have the following code

```python
embeddings = [
     self.embedding_function.embed_documents(list(text))[0] for text in texts
]
```

the goal of the `list(text)` part I believe is to pass a list to the
embed_documents list instead of a a str. However, `list(text)` is a
subtle bug

`list(text)` would convert the string text into an array, where each
element of the array is a character of the string

<img width="937" alt="Screenshot 2023-03-22 at 1 27 18 PM"
src="https://user-images.githubusercontent.com/88190553/226836470-384665a1-2f13-46bc-acfc-9a37417cd918.png">

The correct way should be to change the code to 

```python
embeddings = [
      self.embedding_function.embed_documents([text])[0] for text in texts
]
```
Which wraps the string inside a list.
This commit is contained in:
Kushal Chordiya 2023-03-22 23:57:32 +05:30 committed by GitHub
parent 2212520a6c
commit ff4a25b841
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -243,7 +243,7 @@ class OpenSearchVectorSearch(VectorStore):
List of ids from adding the texts into the vectorstore.
"""
embeddings = [
self.embedding_function.embed_documents(list(text))[0] for text in texts
self.embedding_function.embed_documents([text])[0] for text in texts
]
_validate_embeddings_and_bulk_size(len(embeddings), bulk_size)
return _bulk_ingest_embeddings(