Pydantic validation breaks tests for example (`test_qdrant.py`) because
fake embeddings contain an integer.
This PR casts the embeddings array to all floats.
Now the `qdrant` test passes, `poetry run pytest
tests/integration_tests/vectorstores/test_qdrant.py`
Implementation fails if there are not enough documents. Added the same
check as used for similarity search.
Current implementation raises
```
File ".venv/lib/python3.9/site-packages/langchain/vectorstores/faiss.py", line 160, in max_marginal_relevance_search
_id = self.index_to_docstore_id[i]
KeyError: -1
```
#1081 introduced a method to get DDL (table definitions) in a manner
specific to sqlite3, thus breaking compatibility with other non-sqlite3
databases. This uses the sqlite3 command if the detected dialect is
sqlite, and otherwise uses the standard SQL `SHOW CREATE TABLE`. This
should fix#1103.
Fix KeyError 'items' when no result found.
## Problem
When no result found for a query, google search crashed with `KeyError
'items'`.
## Solution
I added a check for an empty response before accessing the 'items' key.
It will handle the case correctly.
## Other
my twitter: yakigac
(I don't mind even if you don't mention me for this PR. But just because
last time my real name was shout out :) )
### Summary
Adds support for older `.ppt` file in the PowerPoint loader.
### Testing
The following should work on `unstructured==0.4.11` using the example
docs from the `unstructured` repo.
```python
from langchain.document_loaders import UnstructuredPowerPointLoader
filename = "../unstructured/example-docs/fake-power-point.pptx"
loader = UnstructuredPowerPointLoader(filename)
loader.load()
filename = "../unstructured/example-docs/fake-power-point.ppt"
loader = UnstructuredPowerPointLoader(filename)
loader.load()
```
Now downgrade `unstructured` to version `0.4.10`. The following should
work:
```python
from langchain.document_loaders import UnstructuredPowerPointLoader
filename = "../unstructured/example-docs/fake-power-point.pptx"
loader = UnstructuredPowerPointLoader(filename)
loader.load()
```
and the following should give you a `ValueError` and invite you to
upgrade `unstructured`.
```python
from langchain.document_loaders import UnstructuredPowerPointLoader
filename = "../unstructured/example-docs/fake-power-point.ppt"
loader = UnstructuredPowerPointLoader(filename)
loader.load()
```
https://github.com/hwchase17/langchain/issues/1100
When faiss data and doc.index are created in past versions, error occurs
that say there was no attribute. So I put hasattr in the check as a
simple solution.
However, increasing the number of such checks is not good for
conservatism, so I think there is a better solution.
Also, the code for the batch process was left out, so I put it back in.
This import works fine:
```python
from langchain import Anthropic
```
This import does not:
```python
from langchain import AI21
```
```
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name 'AI21' from 'langchain' (/opt/anaconda3/envs/fed_nlp/lib/python3.9/site-packages/langchain/__init__.py)
```
I think there is a slight documentation inconsistency here:
https://langchain.readthedocs.io/en/latest/reference/modules/llms.html
This PR starts to solve that. Should all the import examples be
`from langchain.llms import X` instead of `from langchain import X`?
The #1088 introduced a bug in Qdrant integration. That PR reverts those
changes and provides class attributes to ensure consistent payload keys.
In addition to that, an exception will be thrown if any of texts is None
(that could have been an issue reported in #1087)
This is a work in progress PR to track my progres.
## TODO:
- [x] Get results using the specifed searx host
- [x] Prioritize returning an `answer` or results otherwise
- [ ] expose the field `infobox` when available
- [ ] expose `score` of result to help agent's decision
- [ ] expose the `suggestions` field to agents so they could try new
queries if no results are found with the orignial query ?
- [ ] Dynamic tool description for agents ?
- Searx offers many engines and a search syntax that agents can take
advantage of. It would be nice to generate a dynamic Tool description so
that it can be used many times as a tool but for different purposes.
- [x] Limit number of results
- [ ] Implement paging
- [x] Miror the usage of the Google Search tool
- [x] easy selection of search engines
- [x] Documentation
- [ ] update HowTo guide notebook on Search Tools
- [ ] Handle async
- [ ] Tests
### Add examples / documentation on possible uses with
- [ ] getting factual answers with `!wiki` option and `infoboxes`
- [ ] getting `suggestions`
- [ ] getting `corrections`
---------
Co-authored-by: blob42 <spike@w530>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Alternate implementation to PR #960 Again - only FAISS is implemented.
If accepted can add this to other vectorstores or leave as
NotImplemented? Suggestions welcome...
This PR updates `PromptLayerOpenAI` to now support requests using the
[Async
API](https://langchain.readthedocs.io/en/latest/modules/llms/async_llm.html)
It also updates the documentation on Async API to let users know that
PromptLayerOpenAI also supports this.
`PromptLayerOpenAI` now redefines `_agenerate` a similar was to how it
redefines `_generate`
Adds Google Search integration with [Serper](https://serper.dev) a
low-cost alternative to SerpAPI (10x cheaper + generous free tier).
Includes documentation, tests and examples. Hopefully I am not missing
anything.
Developers can sign up for a free account at
[serper.dev](https://serper.dev) and obtain an api key.
## Usage
```python
from langchain.utilities import GoogleSerperAPIWrapper
from langchain.llms.openai import OpenAI
from langchain.agents import initialize_agent, Tool
import os
os.environ["SERPER_API_KEY"] = ""
os.environ['OPENAI_API_KEY'] = ""
llm = OpenAI(temperature=0)
search = GoogleSerperAPIWrapper()
tools = [
Tool(
name="Intermediate Answer",
func=search.run
)
]
self_ask_with_search = initialize_agent(tools, llm, agent="self-ask-with-search", verbose=True)
self_ask_with_search.run("What is the hometown of the reigning men's U.S. Open champion?")
```
### Output
```
Entering new AgentExecutor chain...
Yes.
Follow up: Who is the reigning men's U.S. Open champion?
Intermediate answer: Current champions Carlos Alcaraz, 2022 men's singles champion.
Follow up: Where is Carlos Alcaraz from?
Intermediate answer: El Palmar, Spain
So the final answer is: El Palmar, Spain
> Finished chain.
'El Palmar, Spain'
```
This PR updates the usage instructions for PromptLayerOpenAI in
Langchain's documentation. The updated instructions provide more detail
and conform better to the style of other LLM integration documentation
pages.
No code changes were made in this PR, only improvements to the
documentation. This update will make it easier for users to understand
how to use `PromptLayerOpenAI`
### Summary
Adds tracked metadata from `unstructured` elements to the document
metadata when `UnstructuredFileLoader` is used in `"elements"` mode.
Tracked metadata is available in `unstructured>=0.4.9`, but the code is
written for backward compatibility with older `unstructured` versions.
### Testing
Before running, make sure to upgrade to `unstructured==0.4.9`. In the
code snippet below, you should see `page_number`, `filename`, and
`category` in the metadata for each document. `doc[0]` should have
`page_number: 1` and `doc[-1]` should have `page_number: 2`. The example
document is `layout-parser-paper-fast.pdf` from the [`unstructured`
sample
docs](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs).
```python
from langchain.document_loaders import UnstructuredFileLoader
loader = UnstructuredFileLoader(file_path=f"layout-parser-paper-fast.pdf", mode="elements")
docs = loader.load()
```
Currently the chain is getting the column names and types on the one
side and the example rows on the other. It is easier for the llm to read
the table information if the column name and examples are shown together
so that it can easily understand to which columns do the examples refer
to. For an instantiation of this, please refer to the changes in the
`sqlite.ipynb` notebook.
Also changed `eval` for `ast.literal_eval` when interpreting the results
from the sample row query since it is a better practice.
---------
Co-authored-by: Francisco Ingham <>
---------
Co-authored-by: Francisco Ingham <fpingham@gmail.com>
This PR adds persistence to the Chroma vector store.
Users can supply a `persist_directory` with any of the `Chroma` creation
methods. If supplied, the store will be automatically persisted at that
directory.
If a user creates a new `Chroma` instance with the same persistence
directory, it will get loaded up automatically. If they use `from_texts`
or `from_documents` in this way, the documents will be loaded into the
existing store.
There is the chance of some funky behavior if the user passes a
different embedding function from the one used to create the collection
- we will make this easier in future updates. For now, we log a warning.