Commit Graph

936 Commits (mine)
 

Author SHA1 Message Date
Tim Asp 01a57198b8
[bugfix] Fix persisted chromadb vectorstore (#1444)
If a `persist_directory` param was set, chromadb would throw a warning
that ""No embedding_function provided, using default embedding function:
SentenceTransformerEmbeddingFunction". and would error with a `Illegal
instruction: 4` error.

This is on a MBP M1 13.2.1, python 3.9.

I'm not entirely sure why that error happened, but when using
`get_or_create_collection` instead of `list_collection` on our end, the
error and warning goes away and chroma works as expected.

Added bonus this is cleaner and likely more efficient.
`list_collections` builds a new `Collection` instance for each collect,
then `Chroma` would just use the `name` field to tell if the collection
existed.
1 year ago
Harrison Chase 8dba30f31e
Harrison/kwargs loaders (#1588)
Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>
1 year ago
Harrison Chase 9f78717b3c
Harrison/callbacks (#1587) 1 year ago
Harrison Chase 90846dcc28
fix chat agent (#1586) 1 year ago
Claus Thomasen 6ed16e13b1
Readded similarity_search_by_vector (#1568)
I am redoing this PR, as I made a mistake by merging the latest changes
into my fork's branch, sorry. This added a bunch of commits to my
previous PR.

This fixes #1451.
1 year ago
Harrison Chase c1dc784a3d
buffer memory old version (#1581)
bring back an older version of memory since people seem to be using it
more widely
1 year ago
fabi.s 5b0e747f9a
Fix description of UnstructuredURLLoader & UnstructuredHTMLLoader (#1570) 1 year ago
Zach Schillaci 624c72c266
Add wikipedia tool doc (#1579) 1 year ago
Ryan Dao a950287206
Strip trailing whitespaces in agent's stop sequences (#1566)
Fixes #1489
1 year ago
Tim Asp 30383abb12
Add CSVLoader document loader (#1573)
Simple CSV document loader which wraps `csv` reader, and preps the file
with a single `Document` per row.

The column header is prepended to each value for context which is useful
for context with embedding and semantic search
1 year ago
Zach Schillaci cdb97f3dfb
Add Wikipedia search utility and tool (#1561)
The Python `wikipedia` package gives easy access for searching and
fetching pages from Wikipedia, see https://pypi.org/project/wikipedia/.
It can serve as an additional search and retrieval tool, like the
existing Google and SerpAPI helpers, for both chains and agents.
1 year ago
Felix Altenberger b44c8bd969
Add optional `base_url` arg to `GitbookLoader` (#1552)
First of all, big kudos on what you guys are doing, langchain is
enabling some really amazing usecases and I'm having lot's of fun
playing around with it. It's really cool how many data sources it
supports out of the box.

However, I noticed some limitations of the current `GitbookLoader` which
this PR adresses:

The main change is that I added an optional `base_url` arg to
`GitbookLoader`. This enables use cases where one wants to crawl docs
from a start page other than the index page, e.g., the following call
would scrape all pages that are reachable via nav bar links from
"https://docs.zenml.io/v/0.35.0":

```python
GitbookLoader(
    web_page="https://docs.zenml.io/v/0.35.0", 
    load_all_paths=True,
    base_url="https://docs.zenml.io",
)
```

Previously, this would fail because relative links would be of the form
`/v/0.35.0/...` and the full link URLs would become
`docs.zenml.io/v/0.35.0/v/0.35.0/...`.

I also fixed another issue of the `GitbookLoader` where the link URLs
were constructed incorrectly as `website//relative_url` if the provided
`web_page` had a trailing slash.
1 year ago
Andriy Mulyar c9189d354a
AtlasDB vector store documentation updates. (#1572)
- Updated errors in the AtlasDB vector store documentation
- Removed extraneous output logs in example notebook.
1 year ago
blob42 622578a022
docs: fix typo in searx tool (#1569)
Co-authored-by: blob42 <spike@w530>
1 year ago
Matt Robinson 7018806a92
feat: document loader for markdown files (#1558)
### Summary

Adds a document loader for handling markdown files. This document loader
requires `unstructured>=0.4.16`.

### Testing

```python
from langchain.document_loaders import UnstructuredMarkdownLoader

loader = UnstructuredMarkdownLoader("README.md")
loader.load()
```
1 year ago
Harrison Chase bd335ffd64
bump version to 106 (#1562) 1 year ago
Harrison Chase a094c49153
add chat agent (#1509) 1 year ago
Brenton Wheeler 99fe023496
docs: fix typo in modules/indexes/chain_examples/question_answering (#1551)
docs: fix typo in modules/indexes/chain_examples/question_answering


![image](https://user-images.githubusercontent.com/11394076/224007874-3a52adf6-ff7a-4f22-9dbf-18c83d08167f.png)
1 year ago
Harrison Chase 3ee32a01ea
Harrison/prompt layer (#1547)
Co-authored-by: Jonathan Pedoeem <jonathanped@gmail.com>
Co-authored-by: AbuBakar <abubakarsohail123@gmail.com>
1 year ago
Harrison Chase c844d1fd46
Harrison/chunk size (#1549)
Co-authored-by: Florian Leuerer <31259070+floleuerer@users.noreply.github.com>
1 year ago
Harrison Chase 9405af6919
Harrison/hf inf error (#1543)
Co-authored-by: Konstantin Hebenstreit <57603012+KonstantinHebenstreit@users.noreply.github.com>
1 year ago
Harrison Chase 357d808484
Harrison/remote paths pdf (#1544)
Co-authored-by: Tim Asp <707699+timothyasp@users.noreply.github.com>
1 year ago
Harrison Chase cc423f40f1
Harrison/youtube loader (#1545)
Co-authored-by: Julian Wustl <57504258+Julianwustl@users.noreply.github.com>
1 year ago
Harrison Chase b053f831cd
Harrison/contributing (#1542)
Co-authored-by: Saurav Maheshkar <sauravvmaheshkar@gmail.com>
1 year ago
Harrison Chase 523ad8d2e2
Harrison/chat history formatter1 (#1538)
Co-authored-by: Youssef A. Abukwaik <yousseb@users.noreply.github.com>
1 year ago
Graham Neubig 31303d0b11
Added other evaluation metrics for data-augmented QA (#1521)
This PR adds additional evaluation metrics for data-augmented QA,
resulting in a report like this at the end of the notebook:

![Screen Shot 2023-03-08 at 8 53 23
AM](https://user-images.githubusercontent.com/398875/223731199-8eb8e77f-5ff3-40a2-a23e-f3bede623344.png)

The score calculation is based on the
[Critique](https://docs.inspiredco.ai/critique/) toolkit, an API-based
toolkit (like OpenAI) that has minimal dependencies, so it should be
easy for people to run if they choose.

The code could further be simplified by actually adding a chain that
calls Critique directly, but that probably should be saved for another
PR if necessary. Any comments or change requests are welcome!
1 year ago
gidler 494c9d341a
[DOCS] Assorted wording, punctuation, and consistency revisions (#1443)
Contributing some small fixes I noticed while reading through the
documentation.

Thank you for a creating and maintaining this project!
1 year ago
Harrison Chase 519f0187b6
Harrison/gdrive pdf (#1433)
Co-authored-by: LM <93918064+LuisMalhadas@users.noreply.github.com>
Co-authored-by: Luis Malhadas <luis@sia.so>
1 year ago
Florian Leuerer 64c6435545
Added client_settings support for chromadb vecstore (#1528)
# Problem

The ChromaDB vecstore only supported local connection. There was no way
to use a chromadb server.

# Fix
Added `client_settings` as Chroma attribute. 

# Usage

```
from chromadb.config import Settings
from langchain.vectorstores import Chroma

chroma_settings = Settings(chroma_api_impl="rest",
                            chroma_server_host="localhost",
                            chroma_server_http_port="80")

docsearch = Chroma.from_documents(chunks, embeddings, metadatas=metadatas, client_settings=chroma_settings, collection_name=COLLECTION_NAME)
```
1 year ago
Harrison Chase 7eba828e1b
Harrison/update regex (#1534)
Co-authored-by: Luis <57528712+LuisLechugaRuiz@users.noreply.github.com>
1 year ago
Harrison Chase 2a7215bc3b
Harrison/prompt issues (#1537) 1 year ago
Alpri Else 784d24a1d5
Support S3 Object keys with `/` in `S3FileLoader` (#1517)
Resolves https://github.com/hwchase17/langchain/issues/1510

### Problem
When loading S3 Objects with `/` in the object key (eg.
`folder/some-document.txt`) using `S3FileLoader`, the objects are
downloaded into a temporary directory and saved as a file.

This errors out when the parent directory does not exist within the
temporary directory.

See
https://github.com/hwchase17/langchain/issues/1510#issuecomment-1459583696
on how to reproduce this bug

### What this pr does
Creates parent directories based on object key. 

This also works with deeply nested keys:
`folder/subfolder/some-document.txt`
1 year ago
Harrison Chase aba58e9e2e
Harrison/bumpver104 (#1525) 1 year ago
Harrison Chase c4a557bdd4
add concept of prompt collection (#1507) 1 year ago
Ivan 97e3666e0d
changed requests.run to requests.get (#1485)
This pull request proposes an update to the Lightweight wrapper
library's documentation. The current documentation provides an example
of how to use the library's requests.run method, as follows:
requests.run("https://www.google.com"). However, this example does not
work for the 0.0.102 version of the library.

Testing:

The changes have been tested locally to ensure they are working as
intended.

Thank you for considering this pull request.
1 year ago
Harrison Chase 7ade419a0e
allow passing of messages into prompt template (#1505) 1 year ago
Harrison Chase a4a2d79087
Harrison/rtd loader (#1513)
Co-authored-by: Youssef A. Abukwaik <yousseb@users.noreply.github.com>
1 year ago
Harrison Chase 8f21605d71
add return source docs (#1515) 1 year ago
Harrison Chase 064741db58
Harrison/fix text splitter (#1511)
Co-authored-by: ajaysolanky <ajsolanky@gmail.com>
Co-authored-by: Ajay Solanky <ajaysolanky@saw-l14668307kd.myfiosgateway.com>
1 year ago
Tom Dyson e3354404ad
Fix link to Pinecone notebook (#1492) 1 year ago
Harrison Chase 3610ef2830
add fake embeddings class (#1503) 1 year ago
Ankush Gola 27104d4921
fix `ChatOpenAI.agenerate` (#1504) 1 year ago
Harrison Chase 4f41e20f09
memory docs (#1501) 1 year ago
Harrison Chase d0062c7a9a
bump version to 103 (#1498) 1 year ago
Harrison Chase 8e6f599822
change to baselanguagemodel (#1496) 1 year ago
Harrison Chase f276bfad8e
Harrison/chat memory (#1495) 1 year ago
Harrison Chase 7bec461782
Harrison/memory refactor (#1478)
moves memory to own module, factors out common stuff
1 year ago
kahkeng df6865cd52
Allow no token limit for ChatGPT API (#1481)
The endpoint default is inf if we don't specify max_tokens, so unlike
regular completion API, we don't need to calculate this based on the
prompt.
1 year ago
Harrison Chase 312c319d8b
bump version to 102 (#1471) 1 year ago
Harrison Chase 0e21463f07
(rfc) chat models (#1424)
Co-authored-by: Ankush Gola <ankush.gola@gmail.com>
1 year ago