Commit Graph

967 Commits

Author SHA1 Message Date
Xin Qiu
4e13cef05a
feat: add redisearch vectorstore (#1307)
# Description

Add `RediSearch` vectorstore for LangChain

RediSearch: [RediSearch quick
start](https://redis.io/docs/stack/search/quick_start/)

# How to use

```
from langchain.vectorstores.redisearch import RediSearch

rds = RediSearch.from_documents(docs, embeddings,redisearch_url="redis://localhost:6379")
```
2023-03-14 18:06:03 -07:00
Harrison Chase
e5c1659864
bump ver (#1668) 2023-03-14 13:05:17 -07:00
Harrison Chase
2d098e8869
Harrison/agent eval (#1620)
Co-authored-by: jerwelborn <jeremy.welborn@gmail.com>
2023-03-14 12:37:48 -07:00
Harrison Chase
8965a2f0af
bump and hotfix (#1665) 2023-03-14 11:12:53 -07:00
Harrison Chase
e222ea4ee8
update rtd config (#1664) 2023-03-14 10:40:06 -07:00
Harrison Chase
e326939759
bump version 110 (#1662) 2023-03-14 10:21:35 -07:00
Harrison Chase
7cf46b3fee
Harrison/convo agent (#1642) 2023-03-14 09:42:24 -07:00
Abhinav Upadhyay
84cd825a0e
Add a batch_size param to the add_texts API of pinecone wrapper (#1658)
A safe default value of batch_size is required by the pinecone python
client otherwise if the user of add_texts passes too many documents in a
single call, they would get a 400 error from pinecone.
2023-03-14 09:40:22 -07:00
Jon Luo
0a1b1806e9
sql: do not hard code the LIMIT clause in the table_info section (#1563)
Seeing a lot of issues in Discord in which the LLM is not using the
correct LIMIT clause for different SQL dialects. ie, it's using `LIMIT`
for mssql instead of `TOP`, or instead of `ROWNUM` for Oracle, etc.
I think this could be due to us specifying the LIMIT statement in the
example rows portion of `table_info`. So the LLM is seeing the `LIMIT`
statement used in the prompt.
Since we can't specify each dialect's method here, I think it's fine to
just replace the `SELECT... LIMIT 3;` statement with `3 rows from
table_name table:`, and wrap everything in a block comment directly
following the `CREATE` statement. The Rajkumar et al paper wrapped the
example rows and `SELECT` statement in a block comment as well anyway.
Thoughts @fpingham?
2023-03-13 23:08:27 -07:00
Brian Thorne
9ee2713272
Bugfix - allow custom input variables in chat zero shot agent's prompt (#1624)
I was trying out the `chat-zero-shot-react-description` agent for
[qabot](dbbd31bb27/qabot/agents/data_query_chain.py (L35-L52))
but langchain 0.0.108 doesn't correctly use custom 'input_variables` in
the prompt template.
2023-03-13 23:07:35 -07:00
Tim Asp
b3234bf3b0
cleanup: unify 3 different pdf loaders, rename PagedPDFSplitter (#1615)
`OnlinePDFLoader` and `PagedPDFSplitter` lived separate from the rest of
the pdf loaders.

Because they're all similar, I propose moving all to `pdy.py` and the
same docs/examples page.

Additionally, `PagedPDFSplitter` naming doesn't match the pattern the
rest of the loaders follow, so I renamed to `PyPDFLoader` and had it
inherit from `BasePDFLoader` so it can now load from remote file
sources.
2023-03-13 23:06:50 -07:00
Luis
562d9891ea
Add regex dict: (#1616)
This class enables us to send a dictionary containing an output key and
the expected format, which in turn allows us to retrieve the result of
the matching formats and extract specific information from it.

To exclude irrelevant information from our return dictionary, we can
prompt the LLM to use a specific command that notifies us when it
doesn't know the answer. We refer to this variable as the
"no_update_value".

Regarding the updated regular expression pattern
(r"{}:\s?([^.'\n']*).?"), it enables us to retrieve a format as 'Output
Key':'value'.

We have improved the regex by adding an optional space between ':' and
'value' with "s?", and by excluding points and line jumps from the
matches using "[^.'\n']*".
2023-03-13 23:05:39 -07:00
Harrison Chase
56aff797c0
docs req (#1647) 2023-03-13 16:03:32 -07:00
Harrison Chase
d53ff270e0
bump version to 109 (#1646) 2023-03-13 15:52:35 -07:00
Harrison Chase
df6c33d4b3
Harrison/new output parser (#1617) 2023-03-13 15:08:39 -07:00
Dennis Aumiller
039d05c808
Update types in cohere.py (#1635)
Adjust argument type and clarification on parameter limits for
attributes `frequency_penalty` and `presence_penalty`.
2023-03-13 09:08:32 -07:00
Harrison Chase
aed9f9febe
Harrison/return intermediate (#1633)
Co-authored-by: Mario Kostelac <mario@intercom.io>
2023-03-13 07:54:29 -07:00
Harrison Chase
72b461e257
improve chat error (#1632) 2023-03-13 07:43:44 -07:00
Peng Qu
cb646082ba
remove an extra whitespace (#1625) 2023-03-13 07:27:21 -07:00
Eugene Yurtsev
bd4a2a670b
Add copy button to sphinx notebooks (#1622)
This adds a copy button at the top right corner of all notebook cells in
sphinx
notebooks.
2023-03-12 21:15:07 -07:00
Ikko Eltociear Ashimine
6e98ab01e1
Fix typo in vectorstore.ipynb (#1614)
Initalize -> Initialize
2023-03-12 14:12:47 -07:00
Harrison Chase
c0ad5d13b8
bump to version 108 (#1613) 2023-03-12 09:50:45 -07:00
yakigac
acd86d33bc
Add read only shared memory (#1491)
Provide shared memory capability for the Agent.
Inspired by #1293 .

## Problem

If both Agent and Tools (i.e., LLMChain) use the same memory, both of
them will save the context. It can be annoying in some cases.


## Solution

Create a memory wrapper that ignores the save and clear, thereby
preventing updates from Agent or Tools.
2023-03-12 09:34:36 -07:00
Abhinav Upadhyay
9707eda83c
Fix docstring of FAISS constructor (#1611) 2023-03-12 09:31:40 -07:00
Kayvane Shakerifar
7e550df6d4
feat: add lookup index to csv loader to make retrieving the original … (#1612)
feat: add lookup index to csv loader to make retrieving the original csv
information easier using theDocument properties
2023-03-12 09:29:27 -07:00
Harrison Chase
c9b5a30b37
move output parsing (#1605) 2023-03-11 16:41:03 -08:00
Harrison Chase
cb04ba0136
Add support for intermediate steps to SQLDatabaseSequentialChain (#1583) (#1601)
for https://github.com/hwchase17/langchain/issues/1582

I simply added the `return_intermediate_steps` and changed the
`output_keys` function.

I added 2 simple tests, 1 for SQLDatabaseSequentialChain without the
intermediate steps and 1 with

Co-authored-by: brad-nemetski <115185478+brad-nemetski@users.noreply.github.com>
2023-03-11 15:44:41 -08:00
Harrison Chase
5903a93f3d
add convinence method to call chat model as an llm (#1604) 2023-03-11 15:04:57 -08:00
Harrison Chase
15de3e8137
Harrison/docs footer (#1600)
Co-authored-by: Albert Avetisian <albert.avetisian@gmail.com>
2023-03-11 09:18:35 -08:00
Harrison Chase
f95d551f7a
Harrison/shallow metadata (#1599)
Co-authored-by: Jesse Zhang <jessetanzhang@gmail.com>
2023-03-11 09:18:25 -08:00
Harrison Chase
c6bfa00178
bump version to 107 (#1590) 2023-03-10 15:39:30 -08:00
Tim Asp
01a57198b8
[bugfix] Fix persisted chromadb vectorstore (#1444)
If a `persist_directory` param was set, chromadb would throw a warning
that ""No embedding_function provided, using default embedding function:
SentenceTransformerEmbeddingFunction". and would error with a `Illegal
instruction: 4` error.

This is on a MBP M1 13.2.1, python 3.9.

I'm not entirely sure why that error happened, but when using
`get_or_create_collection` instead of `list_collection` on our end, the
error and warning goes away and chroma works as expected.

Added bonus this is cleaner and likely more efficient.
`list_collections` builds a new `Collection` instance for each collect,
then `Chroma` would just use the `name` field to tell if the collection
existed.
2023-03-10 15:14:35 -08:00
Harrison Chase
8dba30f31e
Harrison/kwargs loaders (#1588)
Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>
2023-03-10 15:05:06 -08:00
Harrison Chase
9f78717b3c
Harrison/callbacks (#1587) 2023-03-10 12:53:09 -08:00
Harrison Chase
90846dcc28
fix chat agent (#1586) 2023-03-10 12:40:37 -08:00
Claus Thomasen
6ed16e13b1
Readded similarity_search_by_vector (#1568)
I am redoing this PR, as I made a mistake by merging the latest changes
into my fork's branch, sorry. This added a bunch of commits to my
previous PR.

This fixes #1451.
2023-03-10 12:40:14 -08:00
Harrison Chase
c1dc784a3d
buffer memory old version (#1581)
bring back an older version of memory since people seem to be using it
more widely
2023-03-10 11:27:15 -08:00
fabi.s
5b0e747f9a
Fix description of UnstructuredURLLoader & UnstructuredHTMLLoader (#1570) 2023-03-10 07:08:58 -08:00
Zach Schillaci
624c72c266
Add wikipedia tool doc (#1579) 2023-03-10 07:07:27 -08:00
Ryan Dao
a950287206
Strip trailing whitespaces in agent's stop sequences (#1566)
Fixes #1489
2023-03-09 16:36:15 -08:00
Tim Asp
30383abb12
Add CSVLoader document loader (#1573)
Simple CSV document loader which wraps `csv` reader, and preps the file
with a single `Document` per row.

The column header is prepended to each value for context which is useful
for context with embedding and semantic search
2023-03-09 16:35:18 -08:00
Zach Schillaci
cdb97f3dfb
Add Wikipedia search utility and tool (#1561)
The Python `wikipedia` package gives easy access for searching and
fetching pages from Wikipedia, see https://pypi.org/project/wikipedia/.
It can serve as an additional search and retrieval tool, like the
existing Google and SerpAPI helpers, for both chains and agents.
2023-03-09 16:34:39 -08:00
Felix Altenberger
b44c8bd969
Add optional base_url arg to GitbookLoader (#1552)
First of all, big kudos on what you guys are doing, langchain is
enabling some really amazing usecases and I'm having lot's of fun
playing around with it. It's really cool how many data sources it
supports out of the box.

However, I noticed some limitations of the current `GitbookLoader` which
this PR adresses:

The main change is that I added an optional `base_url` arg to
`GitbookLoader`. This enables use cases where one wants to crawl docs
from a start page other than the index page, e.g., the following call
would scrape all pages that are reachable via nav bar links from
"https://docs.zenml.io/v/0.35.0":

```python
GitbookLoader(
    web_page="https://docs.zenml.io/v/0.35.0", 
    load_all_paths=True,
    base_url="https://docs.zenml.io",
)
```

Previously, this would fail because relative links would be of the form
`/v/0.35.0/...` and the full link URLs would become
`docs.zenml.io/v/0.35.0/v/0.35.0/...`.

I also fixed another issue of the `GitbookLoader` where the link URLs
were constructed incorrectly as `website//relative_url` if the provided
`web_page` had a trailing slash.
2023-03-09 16:32:40 -08:00
Andriy Mulyar
c9189d354a
AtlasDB vector store documentation updates. (#1572)
- Updated errors in the AtlasDB vector store documentation
- Removed extraneous output logs in example notebook.
2023-03-09 16:31:14 -08:00
622578a022
docs: fix typo in searx tool (#1569)
Co-authored-by: blob42 <spike@w530>
2023-03-09 15:58:33 -08:00
Matt Robinson
7018806a92
feat: document loader for markdown files (#1558)
### Summary

Adds a document loader for handling markdown files. This document loader
requires `unstructured>=0.4.16`.

### Testing

```python
from langchain.document_loaders import UnstructuredMarkdownLoader

loader = UnstructuredMarkdownLoader("README.md")
loader.load()
```
2023-03-09 10:55:07 -08:00
Harrison Chase
bd335ffd64
bump version to 106 (#1562) 2023-03-09 10:20:54 -08:00
Harrison Chase
a094c49153
add chat agent (#1509) 2023-03-09 09:12:08 -08:00
Brenton Wheeler
99fe023496
docs: fix typo in modules/indexes/chain_examples/question_answering (#1551)
docs: fix typo in modules/indexes/chain_examples/question_answering


![image](https://user-images.githubusercontent.com/11394076/224007874-3a52adf6-ff7a-4f22-9dbf-18c83d08167f.png)
2023-03-09 09:11:43 -08:00
Harrison Chase
3ee32a01ea
Harrison/prompt layer (#1547)
Co-authored-by: Jonathan Pedoeem <jonathanped@gmail.com>
Co-authored-by: AbuBakar <abubakarsohail123@gmail.com>
2023-03-08 21:24:27 -08:00