Ran into a broken build if bs4 wasn't installed in the project.
Minor tweak to follow the other doc loaders optional package-loading
conventions.
Also updated html docs to include reference to this new html loader.
side note: Should there be 2 different html-to-text document loaders?
This new one only handles local files, while the existing unstructured
html loader handles HTML from local and remote. So it seems like the
improvement was adding the title to the metadata, which is useful but
could also be added to `html.py`
Fixed a typo in the argument of the query method within the
VectorStoreIndexWrapper class. Specifically, the argument `retriver` has
been changed to `retriever`. With this correction, the correct argument
name is used, and potential bugs are avoided.
# Description
Add `drop_index` for redis
RediSearch: [RediSearch quick
start](https://redis.io/docs/stack/search/quick_start/)
# How to use
```
from langchain.vectorstores.redis import Redis
Redis.drop_index(index_name="doc",delete_documents=False)
```
Technically a duplicate fix to #1619 but with unit tests and a small
documentation update
- Propagate `filter` arg in Chroma `similarity_search` to delegated call
to `similarity_search_with_score`
- Add `filter` arg to `similarity_search_by_vector`
- Clarify doc strings on FakeEmbeddings
In https://github.com/hwchase17/langchain/issues/1716 , it was
identified that there were two .py files performing similar tasks. As a
resolution, one of the files has been removed, as its purpose had
already been fulfilled by the other file. Additionally, the init has
been updated accordingly.
Furthermore, the how_to_guides.rst file has been updated to include
links to documentation that was previously missing. This was deemed
necessary as the existing list on
https://langchain.readthedocs.io/en/latest/modules/document_loaders/how_to_guides.html
was incomplete, causing confusion for users who rely on the full list of
documentation on the left sidebar of the website.
In the langchain.vectorstores.opensearch_vector_search.py, in the
add_texts function, around line 247, we have the following code
```python
embeddings = [
self.embedding_function.embed_documents(list(text))[0] for text in texts
]
```
the goal of the `list(text)` part I believe is to pass a list to the
embed_documents list instead of a a str. However, `list(text)` is a
subtle bug
`list(text)` would convert the string text into an array, where each
element of the array is a character of the string
<img width="937" alt="Screenshot 2023-03-22 at 1 27 18 PM"
src="https://user-images.githubusercontent.com/88190553/226836470-384665a1-2f13-46bc-acfc-9a37417cd918.png">
The correct way should be to change the code to
```python
embeddings = [
self.embedding_function.embed_documents([text])[0] for text in texts
]
```
Which wraps the string inside a list.
The `CollectionStore` for `PGVector` has a `cmetadata` field but it's
never used. This PR add the ability to save metadata information to the
collection.
The GPT Index project is transitioning to the new project name,
LlamaIndex.
I've updated a few files referencing the old project name and repository
URL to the current ones.
From the [LlamaIndex repo](https://github.com/jerryjliu/llama_index):
> NOTE: We are rebranding GPT Index as LlamaIndex! We will carry out
this transition gradually.
>
> 2/25/2023: By default, our docs/notebooks/instructions now reference
"LlamaIndex" instead of "GPT Index".
>
> 2/19/2023: By default, our docs/notebooks/instructions now use the
llama-index package. However the gpt-index package still exists as a
duplicate!
>
> 2/16/2023: We have a duplicate llama-index pip package. Simply replace
all imports of gpt_index with llama_index if you choose to pip install
llama-index.
I'm not associated with LlamaIndex in any way. I just noticed the
discrepancy when studying the lanchain documentation.
# What does this PR do?
This PR adds similar to `llms` a SageMaker-powered `embeddings` class.
This is helpful if you want to leverage Hugging Face models on SageMaker
for creating your indexes.
I added a example into the
[docs/modules/indexes/examples/embeddings.ipynb](https://github.com/hwchase17/langchain/compare/master...philschmid:add-sm-embeddings?expand=1#diff-e82629e2894974ec87856aedd769d4bdfe400314b03734f32bee5990bc7e8062)
document. The example currently includes some `_### TEMPORARY: Showing
how to deploy a SageMaker Endpoint from a Hugging Face model ###_ ` code
showing how you can deploy a sentence-transformers to SageMaker and then
run the methods of the embeddings class.
@hwchase17 please let me know if/when i should remove the `_###
TEMPORARY: Showing how to deploy a SageMaker Endpoint from a Hugging
Face model ###_` in the description i linked to a detail blog on how to
deploy a Sentence Transformers so i think we don't need to include those
steps here.
I also reused the `ContentHandlerBase` from
`langchain.llms.sagemaker_endpoint` and changed the output type to `any`
since it is depending on the implementation.
Fixes the import typo in the vector db text generator notebook for the
chroma library
Co-authored-by: Anupam <anupam@10-16-252-145.dynapool.wireless.nyu.edu>
I was getting the same issue reported in #1339 by
[MacYang555](https://github.com/MacYang555) when running the test suite
on my Mac. I implemented the fix they suggested to use a regex match in
the output assertion for the scenario under test.
Resolves#1339
Use the following code to test:
```python
import os
from langchain.llms import OpenAI
from langchain.chains.api import podcast_docs
from langchain.chains import APIChain
# Get api key here: https://openai.com/pricing
os.environ["OPENAI_API_KEY"] = "sk-xxxxx"
# Get api key here: https://www.listennotes.com/api/pricing/
listen_api_key = 'xxx'
llm = OpenAI(temperature=0)
headers = {"X-ListenAPI-Key": listen_api_key}
chain = APIChain.from_llm_and_api_docs(llm, podcast_docs.PODCAST_DOCS, headers=headers, verbose=True)
chain.run("Search for 'silicon valley bank' podcast episodes, audio length is more than 30 minutes, return only 1 results")
```
Known issues: the api response data might be too big, and we'll get such
error:
`openai.error.InvalidRequestError: This model's maximum context length
is 4097 tokens, however you requested 6733 tokens (6477 in your prompt;
256 for the completion). Please reduce your prompt; or completion
length.`
When following the Quick Start instructions in the contributing docs, I
was getting a "WheelFileValidationError" on installation of debugpy
which was blocking the installation of a number of other deps. Google
turned up this [GitHub
issue](https://github.com/microsoft/debugpy/issues/1246) indicating a
regression in Poetry 1.4.1 and workarounds.
This PR updates the contrib docs noting the issue and the workarounds.