Commit Graph

327 Commits (ad4eae7ef082b853307895e4560f52304dc74f9e)

Author SHA1 Message Date
Zander Chase 5042bd40d3
Add Shell Tool (#3335)
Create an official bash shell tool to replace the dynamically generated one
1 year ago
Zander Chase 334c162f16
Add Other File Utilities (#3209)
Add other File Utilities, include
- List Directory
- Search for file
- Move
- Copy
- Remove file

Bundle as toolkit
Add a notebook that connects to the Chat Agent, which somewhat supports
multi-arg input tools
Update original read/write files to return the original dir paths and
better handle unsupported file paths.
Add unit tests
1 year ago
Zander Chase da7b51455c
Dynamic tool -> single purpose (#3697)
I think the logic of
https://github.com/hwchase17/langchain/pull/3684#pullrequestreview-1405358565
is too confusing.

I prefer this alternative because:
- All `Tool()` implementations by default will be treated the same as
before. No breaking changes.
- Less reliance on pydantic magic
- The decorator (which only is typed as returning a callable) can infer
schema and generate a structured tool
- Either way, the recommended way to create a custom tool is through
inheriting from the base tool
1 year ago
Zander Chase 4654c58f72
Add validation on agent instantiation for multi-input tools (#3681)
Tradeoffs here:
- No lint-time checking for compatibility
- Differs from JS package
- The signature inference, etc. in the base tool isn't simple
- The `args_schema` is optional 

Pros:
- Forwards compatibility retained
- Doesn't break backwards compatibility
- User doesn't have to think about which class to subclass (single base
tool or dynamic `Tool` interface regardless of input)
-  No need to change the load_tools, etc. interfaces

Co-authored-by: Hasan Patel <mangafield@gmail.com>
1 year ago
Davis Chase b807a114e4
Add query parsing unit tests (#3672) 1 year ago
Eugene Yurtsev 708787dddb
Blob: Add validator and use future annotations (#3650)
Minor changes to the Blob schema.

---------

Co-authored-by: Zander Chase <130414180+vowelparrot@users.noreply.github.com>
1 year ago
Eugene Yurtsev c5a4b4fea1
Suppress duckdb warning in unit tests explicitly (#3653)
This catches the warning raised when using duckdb, asserts that it's as expected.

The goal is to resolve all existing warnings to make unit-testing much stricter.
1 year ago
Eugene Yurtsev e6c8cce050
Add unit-test to catch changes to required deps (#3662)
This adds a unit test that can catch changes to required dependencies
1 year ago
Eugene Yurtsev 055f58960a
Fix pytest collection warning (#3651)
Fixes a pytest collection warning because the test class starts with the
prefix "Test"
1 year ago
plutopulp 6d6fd1b9e1
Add PipelineAI LLM integration (#3644)
Add PipelineAI LLM integration
1 year ago
Harrison Chase a35bbbfa9e
Harrison/lancedb (#3634)
Co-authored-by: Minh Le <minhle@canva.com>
1 year ago
Eugene Yurtsev 5d02010763
Introduce Blob and Blob Loader interface (#3603)
This PR introduces a Blob data type and a Blob loader interface.

This is the first of a sequence of PRs that follows this proposal: 

https://github.com/hwchase17/langchain/pull/2833

The primary goals of these abstraction are:

* Decouple content loading from content parsing code.
* Help duplicated content loading code from document loaders.
* Make lazy loading a default for langchain.
1 year ago
Harrison Chase ab749fa1bb
Harrison/opensearch logic (#3631)
Co-authored-by: engineer-matsuo <95115586+engineer-matsuo@users.noreply.github.com>
1 year ago
Ehsan M. Kermani 4a246e2fd6
Allow clearing cache and fix gptcache (#3493)
This PR

* Adds `clear` method for `BaseCache` and implements it for various
caches
* Adds the default `init_func=None` and fixes gptcache integtest
* Since right now integtest is not running in CI, I've verified the
changes by running `docs/modules/models/llms/examples/llm_caching.ipynb`
(until proper e2e integtest is done in CI)
1 year ago
cs0lar 440c98e24b
Fix/issue 2695 (#3608)
## Background
fixes #2695  

## Changes
The `add_text` method uses the internal embedding function if one was
passes to the `Weaviate` constructor.
NOTE: the latest merge on the `Weaviate` class made the specification of
a `weaviate_api_key` mandatory which might not be desirable for all
users and connection methods (for example weaviate also support Embedded
Weaviate which I am happy to add support to here if people think it's
desirable). I wrapped the fetching of the api key into a try catch in
order to allow the `weaviate_api_key` to be unspecified. Do let me know
if this is unsatisfactory.

## Test Plan
added test for `add_texts` method.
1 year ago
leo-gan 36c59e0c25
`Arxiv` document loader (#3627)
It makes sense to use `arxiv` as another source of the documents for
downloading.
- Added the `arxiv` document_loader, based on the
`utilities/arxiv.py:ArxivAPIWrapper`
- added tests
- added an example notebook
- sorted `__all__` in `__init__.py` (otherwise it is hard to find a
class in the very long list)
1 year ago
Zander Chase 443a893ffd
Align names of search tools (#3620)
Tools for Bing, DDG and Google weren't consistent even though the
underlying implementations were.
All three services now have the same tools and implementations to easily
switch and experiment when building chains.
1 year ago
Maciej Bryński aa345a4bb7
Add get_text_separator parameter to BSHTMLLoader (#3551)
By default get_text doesn't separate content of different HTML tag.
Adding option for specifying separator helps with document splitting.
1 year ago
Zander Chase ee670c448e
Persistent Bash Shell (#3580)
Clean up linting and make more idiomatic by using an output parser

---------

Co-authored-by: FergusFettes <fergusfettes@gmail.com>
1 year ago
Davis Chase d18b0caf0e
Add Anthropic default request timeout (#3540)
thanks @hitflame!

---------

Co-authored-by: Wenqiang Zhao <hitzhaowenqiang@sina.com>
Co-authored-by: delta@com <delta@com>
1 year ago
Roma 2b4e9a3efa
Add unit test for _merge_splits function (#3513)
This commit adds a new unit test for the _merge_splits function in the
text splitter. The new test verifies that the function merges text into
chunks of the correct size and overlap, using a specified separator. The
test passes on the current implementation of the function.
1 year ago
yakigac f338d6251c
Add a test for cosmos db memory (#3525)
Test for #3434 @eavanvalkenburg 
Initially, I was unaware and had submitted a pull request #3450 for the
same purpose, but I have now repurposed the one I used for that. And it
worked.
1 year ago
Harrison Chase 0fc0aa62f2
Harrison/blockchain docloader (#3491)
Co-authored-by: Jon Saginaw <saginawj@users.noreply.github.com>
1 year ago
Harrison Chase 707741de58
Harrison/prediction guard (#3490)
Co-authored-by: Daniel Whitenack <whitenack.daniel@gmail.com>
1 year ago
Harrison Chase 7257f9e015
Harrison/tfidf parameters (#3481)
Co-authored-by: pao <go5kuramubon@gmail.com>
Co-authored-by: KyoHattori <kyo.hattori@abejainc.com>
1 year ago
Harrison Chase eda69b13f3
openai embeddings (#3488) 1 year ago
Harrison Chase 408a0183cd
Harrison/weaviate (#3494)
Co-authored-by: Nick Rubell <nick@rubell.com>
1 year ago
Mindaugas Sharskus a4d85f7fd5
[Fix #3365]: Changed regex to cover new line before action serious (#3367)
Fix for: [Changed regex to cover new line before action
serious.](https://github.com/hwchase17/langchain/issues/3365)
---

This PR fixes the issue where `ValueError: Could not parse LLM output:`
was thrown on seems to be valid input.

Changed regex to cover new lines before action serious (after the
keywords "Action:" and "Action Input:").

regex101: https://regex101.com/r/CXl1kB/1

---------

Co-authored-by: msarskus <msarskus@cisco.com>
1 year ago
Davis Chase b2564a6391
fix #3884 (#3475)
fixes mar bug #3384
1 year ago
Zander Chase 416f3bdf11
Vwp/alpaca streaming (#3468)
Co-authored-by: Luke Stanley <306671+lukestanley@users.noreply.github.com>
1 year ago
cs0lar 3033c6b964
fixes #1214 (#3003)
### Background

Continuing to implement all the interface methods defined by the
`VectorStore` class. This PR pertains to implementation of the
`max_marginal_relevance_search_by_vector` method.

### Changes

- a `max_marginal_relevance_search_by_vector` method implementation has
been added in `weaviate.py`
- tests have been added to the the new method
- vcr cassettes have been added for the weaviate tests

### Test Plan

Added tests for the `max_marginal_relevance_search_by_vector`
implementation

### Change Safety

- [x] I have added tests to cover my changes
1 year ago
Zander Chase 49122a96e7
Structured Tool Bugfixes (#3324)
- Proactively raise error if a tool subclasses BaseTool, defines its
own schema, but fails to add the type-hints
- fix the auto-inferred schema of the decorator to strip the
unneeded virtual kwargs from the schema dict

Helps avoid silent instances of #3297
1 year ago
Davit Buniatyan 2c0023393b
Deep Lake mini upgrades (#3375)
Improvements
* set default num_workers for ingestion to 0
* upgraded notebooks for avoiding dataset creation ambiguity
* added `force_delete_dataset_by_path`
* bumped deeplake to 3.3.0
* creds arg passing to deeplake object that would allow custom S3

Notes
* please double check if poetry is not messed up (thanks!)

Asks
* Would be great to create a shared slack channel for quick questions

---------

Co-authored-by: Davit Buniatyan <d@activeloop.ai>
1 year ago
Zander Chase 20f530e9c5
Add Sentence Transformers Embeddings (#3409)
Add embeddings based on the sentence transformers library.
Add a notebook and integration tests.

Co-authored-by: khimaros <me@khimaros.com>
1 year ago
Luke Harris b4de839ed8
Several confluence loader improvements (#3300)
This PR addresses several improvements:

- Previously it was not possible to load spaces of more than 100 pages.
The `limit` was being used both as an overall page limit *and* as a per
request pagination limit. This, in combination with the fact that
atlassian seem to use a server-side hard limit of 100 when page content
is expanded, meant it wasn't possible to download >100 pages. Now
`limit` is used *only* as a per-request pagination limit and `max_pages`
is introduced as the way to limit the total number of pages returned by
the paginator.
- Document metadata now includes `source` (the source url), making it
compatible with `RetrievalQAWithSourcesChain`.
 - It is now possible to include inline and footer comments.
- It is now possible to pass `verify_ssl=False` and other parameters to
the confluence object for use cases that require it.
1 year ago
Harrison Chase a6664be79c
Harrison/myscale (#3352)
Co-authored-by: Fangrui Liu <fangruil@moqi.ai>
Co-authored-by: 刘 方瑞 <fangrui.liu@outlook.com>
Co-authored-by: Fangrui.Liu <fangrui.liu@ubc.ca>
1 year ago
Filip Haltmayer 215dcc2d26
Refactor Milvus/Zilliz (#3047)
Refactoring milvus/zilliz to clean up and have a more consistent
experience.

Signed-off-by: Filip Haltmayer <filip.haltmayer@zilliz.com>
1 year ago
Richy Wang 88a8f59aa7
Add a full PostgresSQL syntax database 'AnalyticDB' as vector store. (#3135)
Hi there!
I'm excited to open this PR to add support for using a fully Postgres
syntax compatible database 'AnalyticDB' as a vector.
As AnalyticDB has been proved can be used with AutoGPT,
ChatGPT-Retrieve-Plugin, and LLama-Index, I think it is also good for
you.
AnalyticDB is a distributed Alibaba Cloud-Native vector database. It
works better when data comes to large scale. The PR includes:

- [x]  A new memory: AnalyticDBVector
- [x]  A suite of integration tests verifies the AnalyticDB integration

I have read your [contributing
guidelines](72b7d76d79/.github/CONTRIBUTING.md).
And I have passed the tests below
- [x]  make format
- [x]  make lint
- [x]  make coverage
- [x]  make test
1 year ago
Zander Chase 05a8aa5447
Fix linting on master (#3327) 1 year ago
Varun Srinivas d2f922f525
Change in method name for creating an issue on JIRA (#3307)
The awesome JIRA tool created by @zywilliamli calls the `create_issue()`
method to create issues, however, the actual method is `issue_create()`.

Details in the Documentation here:
https://atlassian-python-api.readthedocs.io/jira.html#manage-issues
1 year ago
Paul Garner aa9d5707e0
Add PythonLoader which auto-detects encoding of Python files (#3311)
This PR contributes a `PythonLoader`, which inherits from
`TextLoader` but detects and sets the encoding automatically.
1 year ago
Davis Chase 2fd24d31a4
Cleanup integration test dir (#3308) 1 year ago
Naveen Tatikonda bb6c459f7a
OpenSearch: Add Support for Lucene Filter (#3201)
### Description
Add Support for Lucene Filter. When you specify a Lucene filter for a
k-NN search, the Lucene algorithm decides whether to perform an exact
k-NN search with pre-filtering or an approximate search with modified
post-filtering. This filter is supported only for approximate search
with the indexes that are created using `lucene` engine.

OpenSearch Documentation -
https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#lucene-k-nn-filter-implementation

Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
1 year ago
Davis Chase 46542dc774
Contextual compression retriever (#2915)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
1 year ago
Gabriel Altay 34fb56b633
fix copy/pasta typos wikipedia->arxiv (#3222)
just updates a few module level docstrings from Wikipedia -> Arxiv
1 year ago
Harrison Chase d2520a5f1e
Harrison/ddg (#3206)
Co-authored-by: itai <itai.marks@gmail.com>
Co-authored-by: Itai Marks <itaim@users.noreply.github.com>
Co-authored-by: Tianyi Pan <60060750+tipani86@users.noreply.github.com>
Co-authored-by: Tianyi Pan <tianyi.pan@clobotics.com>
Co-authored-by: Adilzhan Ismailov <13088690+aismlv@users.noreply.github.com>
Co-authored-by: Justin Flick <Justinjayflick@gmail.com>
Co-authored-by: Justin Flick <jflick@homesite.com>
1 year ago
Harrison Chase 9a0356d276
Harrison/file chat history (#3198)
Co-authored-by: Young Lee <joybro201@gmail.com>
1 year ago
Harrison Chase 9181cd9b22
Harrison/playwright selector (#3185)
Co-authored-by: zhyuri <4649294+zhyuri@users.noreply.github.com>
1 year ago
Harrison Chase 68cd37175e
Harrison/arxiv tool (#3186)
Co-authored-by: leo-gan <leo.gan.57@gmail.com>
1 year ago
Zander Chase 4adfd790f0
Update File Management Tools to Include Root Directory (#3112)
- Permit the specification of a `root_dir` to the read/write file tools
to specify a working directory
- Add validation for attempts to read/write outside the directory (e.g.,
through `../../` or symlinks or `/abs/path`'s that don't lie in the
correct path)
- Add some tests for all


One question is whether we should make a default root directory for
these? tradeoffs either way
1 year ago