Commit Graph

1466 Commits

Author SHA1 Message Date
Harrison Chase
6200a2a00e
Harrison/error hf (#3348)
Co-authored-by: Rui Melo <44201826+rufimelo99@users.noreply.github.com>
2023-04-22 09:06:36 -07:00
Honkware
a5ad1c270f
Add ChatGPT Data Loader (#3336)
This pull request adds a ChatGPT document loader to the document loaders
module in `langchain/document_loaders/chatgpt.py`. Additionally, it
includes an example Jupyter notebook in
`docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb`
which uses fake sample data based on the original structure of the
`conversations.json` file.

The following files were added/modified:
- `langchain/document_loaders/__init__.py`
- `langchain/document_loaders/chatgpt.py`
- `docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb`
-
`docs/modules/indexes/document_loaders/examples/example_data/fake_conversations.json`

This pull request was made in response to the recent release of ChatGPT
data exports by email:
https://help.openai.com/en/articles/7260999-how-do-i-export-my-chatgpt-history
2023-04-22 09:06:24 -07:00
Zander Chase
61d40ba042
Fix Sagemaker Batch Endpoints (#3249)
Add different typing for @evandiewald 's heplful PR

---------

Co-authored-by: Evan Diewald <evandiewald@gmail.com>
2023-04-22 08:49:51 -07:00
Johann-Peter Hartmann
7e79f8c136
Support recursive sitemaps in SitemapLoader (#3146)
A (very) simple addition to support multiple sitemap urls.

---------

Co-authored-by: Johann-Peter Hartmann <johann-peter.hartmann@mayflower.de>
2023-04-22 08:48:04 -07:00
Filip Haltmayer
215dcc2d26
Refactor Milvus/Zilliz (#3047)
Refactoring milvus/zilliz to clean up and have a more consistent
experience.

Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
2023-04-22 08:26:19 -07:00
Harrison Chase
8191c6b81a
Harrison/voice assistant (#3347)
Co-authored-by: Jaden <jaden.lorenc@gmail.com>
2023-04-22 08:25:50 -07:00
Richy Wang
88a8f59aa7
Add a full PostgresSQL syntax database 'AnalyticDB' as vector store. (#3135)
Hi there!
I'm excited to open this PR to add support for using a fully Postgres
syntax compatible database 'AnalyticDB' as a vector.
As AnalyticDB has been proved can be used with AutoGPT,
ChatGPT-Retrieve-Plugin, and LLama-Index, I think it is also good for
you.
AnalyticDB is a distributed Alibaba Cloud-Native vector database. It
works better when data comes to large scale. The PR includes:

- [x]  A new memory: AnalyticDBVector
- [x]  A suite of integration tests verifies the AnalyticDB integration

I have read your [contributing
guidelines](72b7d76d79/.github/CONTRIBUTING.md).
And I have passed the tests below
- [x]  make format
- [x]  make lint
- [x]  make coverage
- [x]  make test
2023-04-22 08:25:41 -07:00
Harrison Chase
cc6fe18152
Harrison/power bi (#3205)
Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
2023-04-22 08:24:48 -07:00
Daniel Chalef
61e09229c8
args_schema type hint on subclassing (#3323)
per https://github.com/hwchase17/langchain/issues/3297

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
2023-04-21 15:51:13 -07:00
Zander Chase
05a8aa5447
Fix linting on master (#3327) 2023-04-21 15:49:46 -07:00
Varun Srinivas
d2f922f525
Change in method name for creating an issue on JIRA (#3307)
The awesome JIRA tool created by @zywilliamli calls the `create_issue()`
method to create issues, however, the actual method is `issue_create()`.

Details in the Documentation here:
https://atlassian-python-api.readthedocs.io/jira.html#manage-issues
2023-04-21 13:01:33 -07:00
Davis Chase
e933be9605
Update docs api references (#3315) 2023-04-21 12:21:33 -07:00
Paul Garner
aa9d5707e0
Add PythonLoader which auto-detects encoding of Python files (#3311)
This PR contributes a `PythonLoader`, which inherits from
`TextLoader` but detects and sets the encoding automatically.
2023-04-21 10:47:57 -07:00
Daniel Chalef
1ecbeec24e
Fix example match_documents fn table name, grammar (#3294)
ref
https://github.com/hwchase17/langchain/pull/3100#issuecomment-1517086472

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
2023-04-21 10:21:23 -07:00
Davis Chase
2fd24d31a4
Cleanup integration test dir (#3308) 2023-04-21 09:44:09 -07:00
leo-gan
3bc703b0d6
added links to the important YouTube videos (#3244)
Added links to the important YouTube videos
2023-04-21 01:31:42 -07:00
Sertaç Özercan
1e91266a8a
fix: handle youtube TranscriptsDisabled (#3276)
handles error when youtube video has transcripts disabled

```
youtube_transcript_api._errors.TranscriptsDisabled: 
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=<URL> This is most likely caused by:

Subtitles are disabled for this video

If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!
```

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
2023-04-21 01:27:42 -07:00
Alexandre Pesant
04e1d6c699
Do not print openai settings (#3280)
There's no reason to print these settings like that, it just pollutes
the logs :)
2023-04-21 01:20:17 -07:00
Zander Chase
a71a2c0eb2
Handle null action in AutoGPT Agent (#3274)
Handle the case where the command is `null`
2023-04-20 23:18:46 -07:00
Harrison Chase
bf78200f55
bump version 146 (#3272) 2023-04-20 22:20:43 -07:00
Harrison Chase
87544d2378
gradio tools (#3255) 2023-04-20 22:09:15 -07:00
Naveen Tatikonda
bb6c459f7a
OpenSearch: Add Support for Lucene Filter (#3201)
### Description
Add Support for Lucene Filter. When you specify a Lucene filter for a
k-NN search, the Lucene algorithm decides whether to perform an exact
k-NN search with pre-filtering or an approximate search with modified
post-filtering. This filter is supported only for approximate search
with the indexes that are created using `lucene` engine.

OpenSearch Documentation -
https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#lucene-k-nn-filter-implementation

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
2023-04-20 20:42:53 -07:00
Davis Chase
36720cb57f
Hf emb device (#3266)
Make it possible to control the HuggingFaceEmbeddings and HuggingFaceInstructEmbeddings client model kwargs. Additionally, the cache folder was added for HuggingFaceInstructEmbedding as the client inherits from SentenceTransformer (client of HuggingFaceEmbeddings).

It can be useful, especially to control the client device, as it will be defaulted to GPU by sentence_transformers if there is any.

---------

Co-authored-by: Yoann Poupart <66315201+Xmaster6y@users.noreply.github.com>
2023-04-20 20:41:22 -07:00
Zach Jones
d7942a9f19
Fix type annotation for QueryCheckerTool.llm (#3237)
Currently `langchain.tools.sql_database.tool.QueryCheckerTool` has a
field `llm` with type `BaseLLM`. This breaks initialization for some
LLMs. For example, trying to use it with GPT4:

```python
from langchain.sql_database import SQLDatabase
from langchain.chat_models import ChatOpenAI
from langchain.tools.sql_database.tool import QueryCheckerTool


db = SQLDatabase.from_uri("some_db_uri")
llm = ChatOpenAI(model_name="gpt-4")
tool = QueryCheckerTool(db=db, llm=llm)

# pydantic.error_wrappers.ValidationError: 1 validation error for QueryCheckerTool
# llm
#   Can't instantiate abstract class BaseLLM with abstract methods _agenerate, _generate, _llm_type (type=type_error)
```

Seems like much of the rest of the codebase has switched from `BaseLLM`
to `BaseLanguageModel`. This PR makes the change for QueryCheckerTool as
well

Co-authored-by: Zachary Jones <zjones@zetaglobal.com>
2023-04-20 18:50:59 -07:00
Davis Chase
46542dc774
Contextual compression retriever (#2915)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-04-20 17:01:14 -07:00
Matt Robinson
3943759a90
feat: add loader for rich text files (#3227)
### Summary

Adds a loader for rich text files. Requires `unstructured>=0.5.12`.

### Testing

The following test uses the example RTF file from the [`unstructured`
repo](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs).

```python
from langchain.document_loaders import UnstructuredRTFLoader

loader = UnstructuredRTFLoader("fake-doc.rtf", mode="elements")
docs = loader.load()
docs[0].page_content
```
2023-04-20 15:51:49 -07:00
Harrison Chase
5ef2d1e2a1 add to docs 2023-04-20 15:43:57 -07:00
Harrison Chase
4aedbeaffb Merge branch 'master' of github.com:hwchase17/langchain 2023-04-20 15:43:04 -07:00
Harrison Chase
2dbb5261b5 wikibase agent 2023-04-20 15:37:56 -07:00
Albert Castellana
0684aa081a
Ecosystem/Yeager.ai (#3239)
Added yeagerai.md to ecosystem
2023-04-20 15:20:21 -07:00
Boris Feld
0e797a3ff9
Fixing issue link for Comet callback (#3212)
Sorry I fixed that link once but there was still a typo inside, this
time it should be good.
2023-04-20 14:57:41 -07:00
Daniel Chalef
ae528fd06e
fix error msg ref to beautifulsoup4 (#3242)
Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
2023-04-20 14:03:32 -07:00
Tom Dyson
7d3e6389f2
Add DuckDB prompt (#3233)
Adds a prompt template for the DuckDB SQL dialect.
2023-04-20 14:02:20 -07:00
Zander Chase
daee0b2b97
Patch Chat History Formatting (#3236)
While we work on solidifying the memory interfaces, handle common chat
history formats.

This may break linting on anyone who has been passing in
`get_chat_history` .

Somewhat handles #3077

Alternative to #3078 that updates the typing
2023-04-20 13:31:30 -07:00
Harrison Chase
8f22949dc4 update nnotebook title 2023-04-20 11:53:23 -07:00
leo-gan
130e4b9fcb
fixed a link to the youtube page (#3232)
A link to the `YouTube` page was missing on the `index` page.
2023-04-20 10:47:16 -07:00
Peter Stolz
d54b977d4e
Fix docstring of RetrievalQA (#3231)
Structure changed an RetrievalQA now expects BaseRetriever not
VectorStore
2023-04-20 10:46:51 -07:00
Harrison Chase
b7dea80cba
bump version to 145 (#3229) 2023-04-20 08:30:38 -07:00
Harrison Chase
b7f2061736
Harrison/google places (#3207)
Co-authored-by: Cao Hoang <65607230+cnhhoang850@users.noreply.github.com>
Co-authored-by: vowelparrot <130414180+vowelparrot@users.noreply.github.com>
2023-04-20 07:57:07 -07:00
Gabriel Altay
34fb56b633
fix copy/pasta typos wikipedia->arxiv (#3222)
just updates a few module level docstrings from Wikipedia -> Arxiv
2023-04-20 07:15:41 -07:00
Harrison Chase
d2520a5f1e
Harrison/ddg (#3206)
Co-authored-by: itai <itai.marks@gmail.com>
Co-authored-by: Itai Marks <itaim@users.noreply.github.com>
Co-authored-by: Tianyi Pan <60060750+tipani86@users.noreply.github.com>
Co-authored-by: Tianyi Pan <tianyi.pan@clobotics.com>
Co-authored-by: Adilzhan Ismailov <13088690+aismlv@users.noreply.github.com>
Co-authored-by: Justin Flick <Justinjayflick@gmail.com>
Co-authored-by: Justin Flick <jflick@homesite.com>
2023-04-19 21:32:26 -07:00
Harrison Chase
36c10f8a52
nits (#3203) 2023-04-19 21:14:46 -07:00
Daniel Chalef
27cdf8d675
supabase vectorstore - first cut (#3100)
First cut of a supabase vectorstore loosely patterned on the langchainjs
equivalent. Doesn't support async operations which is a limitation of
the supabase python client.

---------

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
2023-04-19 21:06:44 -07:00
Harrison Chase
9a0356d276
Harrison/file chat history (#3198)
Co-authored-by: Young Lee <joybro201@gmail.com>
2023-04-19 21:05:20 -07:00
Kazon Wilson
a66cab8b71
Add new line to refine prompt tmpl (#3197)
Adding a new line to fix issue #3117
2023-04-19 21:04:52 -07:00
Harrison Chase
96809b5794
Harrison/discord loader (#3200)
Co-authored-by: Rajtilak Bhattacharjee <rajtilak.blog@gmail.com>
2023-04-19 21:04:12 -07:00
Justin Flick
8faef1a91a
Confluence DL retry/backoff (#3168)
Implemented a retry/backoff logic in response to #2473

---------

Co-authored-by: Justin Flick <jflick@homesite.com>
2023-04-19 20:50:39 -07:00
Adilzhan Ismailov
c03a65c6dc
Fix from_embeddings method examples (#3174)
Fix examples for `from_embeddings` method for annoy and faiss
vectorstores
2023-04-19 20:49:33 -07:00
Harrison Chase
f19b3890c9
Harrison/site map tqdm (#3184)
Co-authored-by: Tianyi Pan <60060750+tipani86@users.noreply.github.com>
Co-authored-by: Tianyi Pan <tianyi.pan@clobotics.com>
2023-04-19 20:48:47 -07:00
Harrison Chase
e55db5841a
Harrison/svm speedup (#3195)
Co-authored-by: Lance Martin <122662504+PineappleExpress808@users.noreply.github.com>
2023-04-19 20:14:01 -07:00