Commit Graph

728 Commits

Author SHA1 Message Date
Tim Asp
23231d65a9
Add PyMuPDF PDF loader (#1426)
Different PDF libraries have different strengths and weaknesses. PyMuPDF
does a good job at extracting the most amount of content from the doc,
regardless of the source quality, extremely fast (especially compared to
Unstructured).

https://pymupdf.readthedocs.io/en/latest/index.html
2023-03-03 20:59:28 -08:00
3d54b05863
searx: add install instructions, update doc and notebooks (#1420)
- Added instructions on setting up self hosted searx
- Add notebook example with agent
- Use `localhost:8888` as example url to stay consistent since public
instances are not really usable.

Co-authored-by: blob42 <spike@w530>
2023-03-03 20:57:50 -08:00
Tim Asp
bca0935d90
[docs] fix minor import error (#1425) 2023-03-03 16:10:07 -08:00
Jon Luo
882f7964fb
fix sql misinterpretation of % in query (#1408)
% is being misinterpreted by sqlalchemy as parameter passing, so any
`LIKE 'asdf%'` will result in a value error with mysql, mariadb, and
maybe some others. This is one way to fix it - the alternative is to
simply double up %, like `LIKE 'asdf%%'` but this seemed cleaner in
terms of output.
Fixes #1383
2023-03-02 16:03:16 -08:00
JonLuca De Caro
443992c4d5
[Docs] Add missing word from prompt docs (#1406)
The prompt in the first example of the quickstart guide was missing `for
`
2023-03-02 16:02:54 -08:00
Eugene Yurtsev
a83a371069
Minor documentation update in initialize_agent (#1397)
Updating documentation in initialize_agent.

One thing that could benefit from further clarification is the
responsibility
breakdown by between an AgentExecutor vs. an Agent. The documentation
for an
AgentExecutor does not clarify that. From the class attributes, it
appears that
executor has access to the tools, while the agent is only aware of the
tool
names. Anyway, additional clarification would be beneficial on the
AgentExecutor class.
2023-03-02 11:46:35 -08:00
Nuno Campos
499e76b199
Allow the regular openai class to be used for ChatGPT models (#1393)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-03-02 09:04:18 -08:00
Kacper Łukawski
8947797250
Return Cohere embeddings as lists of floats (#1394)
This PR fixes the types returned by Cohere embeddings. Currently, Cohere
client returns instances of `cohere.embeddings.Embeddings`. Since the
transport layer relies on JSON, some numbers might be represented as
ints, not floats, which happens quite often. While that doesn't seem to
be an issue, it breaks some pydantic models if they require strict
floats.
2023-03-02 09:02:10 -08:00
Jason Gill
1989e7d4c2
Update examples to prevent confusing missing _type warning (#1391)
The YAML and JSON examples of prompt serialization now give a strange
`No '_type' key found, defaulting to 'prompt'` message when you try to
run them yourself or copy the format of the files. The reason for this
harmless warning is that the _type key was not in the config files,
which means they are parsed as a standard prompt.

This could be confusing to new users (like it was confusing to me after
upgrading from 0.0.85 to 0.0.86+ for my few_shot prompts that needed a
_type added to the example_prompt config), so this update includes the
_type key just for clarity.

Obviously this is not critical as the warning is harmless, but it could
be confusing to track down or be interpreted as an error by a new user,
so this update should resolve that.
2023-03-02 07:39:57 -08:00
Harrison Chase
dda5259f68
bump version to 0.0.99 (#1390) 2023-03-02 07:25:59 -08:00
Kacper Łukawski
f032609f8d
Add recursive parameter to DirectoryLoader (#1389)
This PR allows loading a directory recursively.
2023-03-02 07:06:26 -08:00
Kacper Łukawski
9ac442624c
Add Qdrant named arguments (#1386)
This PR:
- Increases `qdrant-client` version to 1.0.4
- Introduces custom content and metadata keys (as requested in #1087)
- Moves all the `QdrantClient` parameters into the method parameters to
simplify code completion
2023-03-02 07:05:14 -08:00
Francisco Ingham
34abcd31b9
remove limit clause from prompt for compatibility with ms sql server (#1385)
For reference see:
8a35811556

Co-authored-by: Francisco Ingham <>
2023-03-02 07:02:42 -08:00
Ankush Gola
fe30be6fba
add async and streaming support to OpenAIChat (#1378)
title says it all
2023-03-01 21:55:43 -08:00
Lakshya Agarwal
cfed0497ac
Minor grammatical fixes (#1325)
Fixed typos and links in a few places across documents
2023-03-01 21:18:09 -08:00
Ryan Dao
59157b6891
Bug: Fix Python version validation in PythonAstREPLTool (#1373)
The current logic checks if the Python major version is < 8, which is
wrong. This checks if the major and minor version is < 3.9.
2023-03-01 21:15:27 -08:00
Harrison Chase
e178008b75
Harrison/track token usage (#1382)
Co-authored-by: Zak King <zaking17@gmail.com>
2023-03-01 21:15:13 -08:00
Harrison Chase
1cd8996074
Harrison/summarizer chain (#1356)
Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>
2023-03-01 20:59:07 -08:00
yakigac
cfae03042d
Fix the openaichat example (#1377)
The example was wrong.
2023-03-01 18:24:32 -08:00
Harrison Chase
4b5e850361
chatgpt wrapper (#1367) 2023-03-01 11:47:01 -08:00
Harrison Chase
4d4b43cf5a
fix doc names (#1354) 2023-03-01 09:40:31 -08:00
Harrison Chase
c01f9100e4
bump version to 0097 (#1365) 2023-03-01 08:20:24 -08:00
Christie Jacob
edb3915ee7
typo in vectorstores (#1362) 2023-03-01 07:21:37 -08:00
Harrison Chase
fe7dbecfe6
pandas and csv agents (#1353) 2023-02-28 22:19:11 -08:00
Harrison Chase
02ec72df87
improve docs (#1351) 2023-02-28 21:37:18 -08:00
Jon Luo
92ab27e4b8
sql doc formatting (#1350)
My bad, missed a few tabs between the two PRs
2023-02-28 19:54:46 -08:00
Ankush Gola
82baecc892
Add a SQL agent for interacting with SQL Databases and JSON Agent for interacting with large JSON blobs (#1150)
This PR adds 

* `ZeroShotAgent.as_sql_agent`, which returns an agent for interacting
with a sql database. This builds off of `SQLDatabaseChain`. The main
advantages are 1) answering general questions about the db, 2) access to
a tool for double checking queries, and 3) recovering from errors
* `ZeroShotAgent.as_json_agent` which returns an agent for interacting
with json blobs.
* Several examples in notebooks

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-02-28 19:44:39 -08:00
Jon Luo
35f1e8f569
separate columns by tabs instead of single space in sql sample rows (#1348)
Use tabs to separate columns instead of a single space - confusing when
there are spaces in a cell
2023-02-28 18:59:53 -08:00
kurehajime
6c629b54e6
Fixed arguments passed to InvalidTool.run(). (#1340)
[InvalidTool.run()](72ef69d1ba/langchain/agents/tools.py (L43))
returns "{arg}is not a valid tool, try another one.".
However, no function name is actually given in the argument.
This causes LLM to be stuck in a loop, unable to find the right tool.

This may resolve these Issues.
https://github.com/hwchase17/langchain/issues/998
https://github.com/hwchase17/langchain/issues/702
2023-02-28 18:58:23 -08:00
James Brotchie
3574418a40
Fix link in summarization.md (#1344)
"Utilities for working with Documents" was linking to a non-useful page.
Re-linked to the utils page that includes info about working with docs.
2023-02-28 18:58:12 -08:00
Jon Luo
5bf8772f26
add option to use user-defined SQL table info (#1347)
Currently, table information is gathered through SQLAlchemy as complete
table DDL and a user-selected number of sample rows from each table.
This PR adds the option to use user-defined table information instead of
automatically collecting it. This will use the provided table
information and fall back to the automatic gathering for tables that the
user didn't provide information for.

Off the top of my head, there are a few cases where this can be quite
useful:
- The first n rows of a table are uninformative, or very similar to one
another. In this case, hand-crafting example rows for a table such that
they provide the good, diverse information can be very helpful. Another
approach we can think about later is getting a random sample of n rows
instead of the first n rows, but there are some performance
considerations that need to be taken there. Even so, hand-crafting the
sample rows is useful and can guarantee the model sees informative data.
- The user doesn't want every column to be available to the model. This
is not an elegant way to fulfill this specific need since the user would
have to provide the table definition instead of a simple list of columns
to include or ignore, but it does work for this purpose.
- For the developers, this makes it a lot easier to compare/benchmark
the performance of different prompting structures for providing table
information in the prompt.

These are cases I've run into myself (particularly cases 1 and 3) and
I've found these changes useful. Personally, I keep custom table info
for a few tables in a yaml file for versioning and easy loading.

Definitely open to other opinions/approaches though!
2023-02-28 18:58:04 -08:00
Harrison Chase
924bba5ce9
bump version (#1342) 2023-02-28 08:48:32 -08:00
Harrison Chase
786852e9e6
partial variables (#1308) 2023-02-28 08:40:35 -08:00
Tim Asp
72ef69d1ba
Add new iFixit document loader (#1333)
iFixit is a wikipedia-like site that has a huge amount of open content
on how to fix things, questions/answers for common troubleshooting and
"things" related content that is more technical in nature. All content
is licensed under CC-BY-SA-NC 3.0

Adding docs from iFixit as context for user questions like "I dropped my
phone in water, what do I do?" or "My macbook pro is making a whining
noise, what's wrong with it?" can yield significantly better responses
than context free response from LLMs.
2023-02-27 20:40:20 -08:00
Matt Robinson
1aa41b5741
feat: document loader for image files (#1330)
### Summary

Adds a document loader for image files such as `.jpg` and `.png` files.

### Testing

Run the following using the example document from the [`unstructured`
repo](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs).

```python
from langchain.document_loaders.image import UnstructuredImageLoader

loader = UnstructuredImageLoader("layout-parser-paper-fast.jpg")
loader.load()
```
2023-02-27 14:43:32 -08:00
Eugene Yurtsev
c14cff60d0
Documentation: Minor typo fixes (#1327)
Fixing a few minor typos in the documentation (and likely introducing
other
ones in the process).
2023-02-27 14:40:43 -08:00
Harrison Chase
f61858163d
bump version to 0.0.95 (#1324) 2023-02-27 07:45:54 -08:00
Harrison Chase
0824d65a5c
Harrison/indexing pipeline (#1317) 2023-02-27 00:31:36 -08:00
Akshay
a0bf856c70
Update agent_vectorstore.ipynb (#1318)
nitpicking but just thought i'd add this typo which I found when going
through the How-to 😄 (unless it was intentional) also, it's amazing that
you added ReAct to LangChain!
2023-02-26 23:22:35 -08:00
Harrison Chase
166cda2cc6
Harrison/deeplake (#1316)
Co-authored-by: Davit Buniatyan <d@activeloop.ai>
2023-02-26 22:35:04 -08:00
Harrison Chase
aaad6cc954
Harrison/atlas db (#1315)
Co-authored-by: Brandon Duderstadt <brandonduderstadt@gmail.com>
2023-02-26 22:11:38 -08:00
Marc Puig
3989c793fd
Making it possible to use "certainty" as a parameter for the weaviate similarity_search (#1218)
Checking if weaviate similarity_search kwargs contains "certainty" and
use it accordingly. The minimal level of certainty must be a float, and
it is computed by normalized distance.
2023-02-26 17:55:28 -08:00
Alexander Hoyle
42b892c21b
Avoid IntegrityError for SQLiteCache updates (#1286)
While using a `SQLiteCache`, if there are duplicate `(prompt, llm, idx)`
tuples passed to
[`update_cache()`](c5dd491a21/langchain/llms/base.py (L39)),
then an `IntegrityError` is thrown. This can happen when there are
duplicated prompts within the same batch.

This PR changes the SQLAlchemy `session.add()` to a `session.merge()` in
`cache.py`, [following the solution from this SO
thread](https://stackoverflow.com/questions/10322514/dealing-with-duplicate-primary-keys-on-insert-in-sqlalchemy-declarative-style).
I believe this fixes #983, but not entirely sure since that also
involves async

Here's a minimal example of the error:
```python
from pathlib import Path

import langchain
from langchain.cache import SQLiteCache

llm = langchain.OpenAI(model_name="text-ada-001", openai_api_key=Path("/.openai_api_key").read_text().strip())
langchain.llm_cache = SQLiteCache("test_cache.db")
llm.generate(['a'] * 5)
```
```
>   IntegrityError: (sqlite3.IntegrityError) UNIQUE constraint failed: full_llm_cache.prompt, full_llm_cache.llm, full_llm_cache.idx
    [SQL: INSERT INTO full_llm_cache (prompt, llm, idx, response) VALUES (?, ?, ?, ?)]
    [parameters: ('a', "[('_type', 'openai'), ('best_of', 1), ('frequency_penalty', 0), ('logit_bias', {}), ('max_tokens', 256), ('model_name', 'text-ada-001'), ('n', 1), ('presence_penalty', 0), ('request_timeout', None), ('stop', None), ('temperature', 0.7), ('top_p', 1)]", 0, '\n\nA is for air.\n\nA is for atmosphere.')]
    (Background on this error at: https://sqlalche.me/e/14/gkpj)
```

After the change, we now have the following
```python
class Output:
    def __init__(self, text):
        self.text = text

# make dummy data
cache = SQLiteCache("test_cache_2.db")
cache.update(prompt="prompt_0", llm_string="llm_0", return_val=[Output("text_0")])
cache.engine.execute("SELECT * FROM full_llm_cache").fetchall()

# output
>   [('prompt_0', 'llm_0', 0, 'text_0')]
```

```python
#  update data, before change this would have thrown an `IntegrityError`
cache.update(prompt="prompt_0", llm_string="llm_0", return_val=[Output("text_0_new")])
cache.engine.execute("SELECT * FROM full_llm_cache").fetchall()

# output
>   [('prompt_0', 'llm_0', 0, 'text_0_new')]
```
2023-02-26 17:54:43 -08:00
Harrison Chase
81abcae91a
Harrison/banana fix (#1311)
Co-authored-by: Erik Dunteman <44653944+erik-dunteman@users.noreply.github.com>
2023-02-26 17:53:57 -08:00
Casey A. Fitzpatrick
648b3b3909
Fix use case sentence for bash util doc (#1295)
Thanks for all your hard work!

I noticed a small typo in the bash util doc so here's a quick update.
Additionally, my formatter caught some spacing in the `.md` as well.
Happy to revert that if it's an issue.

The main change is just
```
- A common use case this is for letting it interact with your local file system. 

+ A common use case for this is letting the LLM interact with your local file system.
```

## Testing

`make docs_build` succeeds locally and the changes show as expected ✌️ 
<img width="704" alt="image"
src="https://user-images.githubusercontent.com/17773666/221376160-e99e59a6-b318-49d1-a1d7-89f5c17cdab4.png">
2023-02-26 17:41:03 -08:00
Ingo Kleiber
fd9975dad7
add CoNLL-U document loader (#1297)
I've added a simple
[CoNLL-U](https://universaldependencies.org/format.html) document
loader. CoNLL-U is a common format for NLP tasks and is used, for
example, in the Universal Dependencies treebank corpora. The loader
reads a single file in standard CoNLL-U format and returns a document.
2023-02-26 17:27:00 -08:00
Harrison Chase
d29f74114e
copy paste loader (#1302) 2023-02-26 17:26:37 -08:00
Harrison Chase
ce441edd9c
improve docs (#1309) 2023-02-26 11:25:16 -08:00
Harrison Chase
6f30d68581
add example of using agent with vectorstores (#1285) 2023-02-25 13:27:24 -08:00
Harrison Chase
002da6edc0
ruff ruff (#1203) 2023-02-25 08:59:52 -08:00