Commit Graph

1682 Commits (05170b676429fa1ed7bf7335c517d9f1db3a4851)
 

Author SHA1 Message Date
Davit Buniatyan 2c0023393b
Deep Lake mini upgrades (#3375)
Improvements
* set default num_workers for ingestion to 0
* upgraded notebooks for avoiding dataset creation ambiguity
* added `force_delete_dataset_by_path`
* bumped deeplake to 3.3.0
* creds arg passing to deeplake object that would allow custom S3

Notes
* please double check if poetry is not messed up (thanks!)

Asks
* Would be great to create a shared slack channel for quick questions

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>
1 year ago
Haste171 93d53e417a
Update unstructured_file.ipynb (#3377)
Fix typo in docs
1 year ago
张城铭 487a57ffe6
Optimize code (#3412)
Co-authored-by: assert <zhangchengming@kkguan.com>
1 year ago
Zander Chase 3d8243ec95
Catch all exceptions in autogpt (#3413)
Ought to be more autonomous
1 year ago
Zander Chase 738ee56b86
Move Generative Agent definition to Experimental (#3245)
Extending @BeautyyuYanli 's #3220 to move from the notebook

---------

Co-authored-by: BeautyyuYanli <beautyyuyanli@gmail.com>
1 year ago
Zander Chase 20f530e9c5
Add Sentence Transformers Embeddings (#3409)
Add embeddings based on the sentence transformers library.
Add a notebook and integration tests.

Co-authored-by: khimaros <me@khimaros.com>
1 year ago
Zander Chase 73bc70b4fa
Update marathon notebook (#3408)
Fixes #3404
1 year ago
Luke Harris b4de839ed8
Several confluence loader improvements (#3300)
This PR addresses several improvements:

- Previously it was not possible to load spaces of more than 100 pages.
The `limit` was being used both as an overall page limit *and* as a per
request pagination limit. This, in combination with the fact that
atlassian seem to use a server-side hard limit of 100 when page content
is expanded, meant it wasn't possible to download >100 pages. Now
`limit` is used *only* as a per-request pagination limit and `max_pages`
is introduced as the way to limit the total number of pages returned by
the paginator.
- Document metadata now includes `source` (the source url), making it
compatible with `RetrievalQAWithSourcesChain`.
 - It is now possible to include inline and footer comments.
- It is now possible to pass `verify_ssl=False` and other parameters to
the confluence object for use cases that require it.
1 year ago
zz 651cb62556
Add support for wikipedia's lang parameter (#3383)
Allow to hange the language of the wikipedia API being requested.

Co-authored-by: zhuohui <zhuohui@datastory.com.cn>
1 year ago
Johann-Peter Hartmann 199cb855ea
Improve youtube loader (#3395)
Small improvements for the YouTube loader: 
a) use the YouTube API permission scope instead of Google Drive 
b) bugfix: allow transcript loading for single videos 
c) an additional parameter "continue_on_failure" for cases when videos
in a playlist do not have transcription enabled.
d) support automated translation for all languages, if available.

---------

Co-authored-by: Johann-Peter Hartmann <johann-peter.hartmann@mayflower.de>
1 year ago
Harrison Chase e5ffbee5eb
Harrison/hf document loader (#3394)
Co-authored-by: Azam Iftikhar <azamiftikhar1000@gmail.com>
1 year ago
Hadi Curtay acfd11c8e4
Updated incorrect link to Weaviate notebook (#3362)
The detailed walkthrough of the Weaviate wrapper was pointing to the
getting-started notebook. Fixed it to point to the Weaviable notebook in
the examples folder.
1 year ago
Ismail Pelaseyed b21fe0a18f
Add example on deploying LangChain to `Cloud Run` (#3366)
## Summary

Adds a link to a minimal example of running LangChain on Google Cloud
Run.
1 year ago
Ivan Zatevakhin 77bb6c99f7
llamacpp wrong default value passed for `f16_kv` (#3320)
Fixes default f16_kv value in llamacpp; corrects incorrect parameter
passed.

See:
ba3959eafd/llama_cpp/llama.py (L33)

Fixes #3241
Fixes #3301
1 year ago
Harrison Chase 3a1bdce3f5
bump version to 147 (#3353) 1 year ago
Harrison Chase a6664be79c
Harrison/myscale (#3352)
Co-authored-by: Fangrui Liu <fangruil@moqi.ai>
Co-authored-by: 刘 方瑞 <fangrui.liu@outlook.com>
Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>
1 year ago
Harrison Chase 6200a2a00e
Harrison/error hf (#3348)
Co-authored-by: Rui Melo <44201826+rufimelo99@users.noreply.github.com>
1 year ago
Honkware a5ad1c270f
Add ChatGPT Data Loader (#3336)
This pull request adds a ChatGPT document loader to the document loaders
module in `langchain/document_loaders/chatgpt.py`. Additionally, it
includes an example Jupyter notebook in
`docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb`
which uses fake sample data based on the original structure of the
`conversations.json` file.

The following files were added/modified:
- `langchain/document_loaders/__init__.py`
- `langchain/document_loaders/chatgpt.py`
- `docs/modules/indexes/document_loaders/examples/chatgpt_loader.ipynb`
-
`docs/modules/indexes/document_loaders/examples/example_data/fake_conversations.json`

This pull request was made in response to the recent release of ChatGPT
data exports by email:
https://help.openai.com/en/articles/7260999-how-do-i-export-my-chatgpt-history
1 year ago
Zander Chase 61d40ba042
Fix Sagemaker Batch Endpoints (#3249)
Add different typing for @evandiewald 's heplful PR

---------

Co-authored-by: Evan Diewald <evandiewald@gmail.com>
1 year ago
Johann-Peter Hartmann 7e79f8c136
Support recursive sitemaps in SitemapLoader (#3146)
A (very) simple addition to support multiple sitemap urls.

---------

Co-authored-by: Johann-Peter Hartmann <johann-peter.hartmann@mayflower.de>
1 year ago
Filip Haltmayer 215dcc2d26
Refactor Milvus/Zilliz (#3047)
Refactoring milvus/zilliz to clean up and have a more consistent
experience.

Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
1 year ago
Harrison Chase 8191c6b81a
Harrison/voice assistant (#3347)
Co-authored-by: Jaden <jaden.lorenc@gmail.com>
1 year ago
Richy Wang 88a8f59aa7
Add a full PostgresSQL syntax database 'AnalyticDB' as vector store. (#3135)
Hi there!
I'm excited to open this PR to add support for using a fully Postgres
syntax compatible database 'AnalyticDB' as a vector.
As AnalyticDB has been proved can be used with AutoGPT,
ChatGPT-Retrieve-Plugin, and LLama-Index, I think it is also good for
you.
AnalyticDB is a distributed Alibaba Cloud-Native vector database. It
works better when data comes to large scale. The PR includes:

- [x]  A new memory: AnalyticDBVector
- [x]  A suite of integration tests verifies the AnalyticDB integration

I have read your [contributing
guidelines](72b7d76d79/.github/CONTRIBUTING.md).
And I have passed the tests below
- [x]  make format
- [x]  make lint
- [x]  make coverage
- [x]  make test
1 year ago
Harrison Chase cc6fe18152
Harrison/power bi (#3205)
Co-authored-by: Eduard van Valkenburg <eavanvalkenburg@users.noreply.github.com>
1 year ago
Daniel Chalef 61e09229c8
args_schema type hint on subclassing (#3323)
per https://github.com/hwchase17/langchain/issues/3297

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
1 year ago
Zander Chase 05a8aa5447
Fix linting on master (#3327) 1 year ago
Varun Srinivas d2f922f525
Change in method name for creating an issue on JIRA (#3307)
The awesome JIRA tool created by @zywilliamli calls the `create_issue()`
method to create issues, however, the actual method is `issue_create()`.

Details in the Documentation here:
https://atlassian-python-api.readthedocs.io/jira.html#manage-issues
1 year ago
Davis Chase e933be9605
Update docs api references (#3315) 1 year ago
Paul Garner aa9d5707e0
Add PythonLoader which auto-detects encoding of Python files (#3311)
This PR contributes a `PythonLoader`, which inherits from
`TextLoader` but detects and sets the encoding automatically.
1 year ago
Daniel Chalef 1ecbeec24e
Fix example match_documents fn table name, grammar (#3294)
ref
https://github.com/hwchase17/langchain/pull/3100#issuecomment-1517086472

Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
1 year ago
Davis Chase 2fd24d31a4
Cleanup integration test dir (#3308) 1 year ago
leo-gan 3bc703b0d6
added links to the important YouTube videos (#3244)
Added links to the important YouTube videos
1 year ago
Sertaç Özercan 1e91266a8a
fix: handle youtube TranscriptsDisabled (#3276)
handles error when youtube video has transcripts disabled

```
youtube_transcript_api._errors.TranscriptsDisabled: 
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=<URL> This is most likely caused by:

Subtitles are disabled for this video

If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!
```

Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
1 year ago
Alexandre Pesant 04e1d6c699
Do not print openai settings (#3280)
There's no reason to print these settings like that, it just pollutes
the logs :)
1 year ago
Zander Chase a71a2c0eb2
Handle null action in AutoGPT Agent (#3274)
Handle the case where the command is `null`
1 year ago
Harrison Chase bf78200f55
bump version 146 (#3272) 1 year ago
Harrison Chase 87544d2378
gradio tools (#3255) 1 year ago
Naveen Tatikonda bb6c459f7a
OpenSearch: Add Support for Lucene Filter (#3201)
### Description
Add Support for Lucene Filter. When you specify a Lucene filter for a
k-NN search, the Lucene algorithm decides whether to perform an exact
k-NN search with pre-filtering or an approximate search with modified
post-filtering. This filter is supported only for approximate search
with the indexes that are created using `lucene` engine.

OpenSearch Documentation -
https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#lucene-k-nn-filter-implementation

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
1 year ago
Davis Chase 36720cb57f
Hf emb device (#3266)
Make it possible to control the HuggingFaceEmbeddings and HuggingFaceInstructEmbeddings client model kwargs. Additionally, the cache folder was added for HuggingFaceInstructEmbedding as the client inherits from SentenceTransformer (client of HuggingFaceEmbeddings).

It can be useful, especially to control the client device, as it will be defaulted to GPU by sentence_transformers if there is any.

---------

Co-authored-by: Yoann Poupart <66315201+Xmaster6y@users.noreply.github.com>
1 year ago
Zach Jones d7942a9f19
Fix type annotation for `QueryCheckerTool.llm` (#3237)
Currently `langchain.tools.sql_database.tool.QueryCheckerTool` has a
field `llm` with type `BaseLLM`. This breaks initialization for some
LLMs. For example, trying to use it with GPT4:

```python
from langchain.sql_database import SQLDatabase
from langchain.chat_models import ChatOpenAI
from langchain.tools.sql_database.tool import QueryCheckerTool


db = SQLDatabase.from_uri("some_db_uri")
llm = ChatOpenAI(model_name="gpt-4")
tool = QueryCheckerTool(db=db, llm=llm)

# pydantic.error_wrappers.ValidationError: 1 validation error for QueryCheckerTool
# llm
#   Can't instantiate abstract class BaseLLM with abstract methods _agenerate, _generate, _llm_type (type=type_error)
```

Seems like much of the rest of the codebase has switched from `BaseLLM`
to `BaseLanguageModel`. This PR makes the change for QueryCheckerTool as
well

Co-authored-by: Zachary Jones <zjones@zetaglobal.com>
1 year ago
Davis Chase 46542dc774
Contextual compression retriever (#2915)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Matt Robinson 3943759a90
feat: add loader for rich text files (#3227)
### Summary

Adds a loader for rich text files. Requires `unstructured>=0.5.12`.

### Testing

The following test uses the example RTF file from the [`unstructured`
repo](https://github.com/Unstructured-IO/unstructured/tree/main/example-docs).

```python
from langchain.document_loaders import UnstructuredRTFLoader

loader = UnstructuredRTFLoader("fake-doc.rtf", mode="elements")
docs = loader.load()
docs[0].page_content
```
1 year ago
Harrison Chase 5ef2d1e2a1 add to docs 1 year ago
Harrison Chase 4aedbeaffb Merge branch 'master' of github.com:hwchase17/langchain 1 year ago
Harrison Chase 2dbb5261b5 wikibase agent 1 year ago
Albert Castellana 0684aa081a
Ecosystem/Yeager.ai (#3239)
Added yeagerai.md to ecosystem
1 year ago
Boris Feld 0e797a3ff9
Fixing issue link for Comet callback (#3212)
Sorry I fixed that link once but there was still a typo inside, this
time it should be good.
1 year ago
Daniel Chalef ae528fd06e
fix error msg ref to beautifulsoup4 (#3242)
Co-authored-by: Daniel Chalef <daniel.chalef@private.org>
1 year ago
Tom Dyson 7d3e6389f2
Add DuckDB prompt (#3233)
Adds a prompt template for the DuckDB SQL dialect.
1 year ago
Zander Chase daee0b2b97
Patch Chat History Formatting (#3236)
While we work on solidifying the memory interfaces, handle common chat
history formats.

This may break linting on anyone who has been passing in
`get_chat_history` .

Somewhat handles #3077

Alternative to #3078 that updates the typing
1 year ago