In the current solution, AgentType and AGENT_TO_CLASS are placed in two
separate files and both manually maintained. This might cause
inconsistency when we update either of them.
— latest —
based on the discussion with hwchase17, we don’t know how to further use
the newly introduced AgentTypeConfig type, so it doesn’t make sense yet
to add it. Instead, it’s better to move the dictionary to another file
to keep the loading.py file clear. The consistency is a good point.
Instead of asserting the consistency during linting, we added a unittest
for consistency check. I think it works as auto unittest is triggered
every time with clear failure notice. (well, force push is possible, but
we all know what we are doing, so let’s show trust. :>)
~~This PR includes~~
- ~~Introduced AgentTypeConfig as the source of truth of all AgentType
related meta data.~~
- ~~Each AgentTypeConfig is a annotated class type which can be used for
annotation in other places.~~
- ~~Each AgentTypeConfig can be easily extended when we have more meta
data needs.~~
- ~~Strong assertion to ensure AgentType and AGENT_TO_CLASS are always
consistent.~~
- ~~Made AGENT_TO_CLASS automatically generated.~~
~~Test Plan:~~
- ~~since this change is focusing on annotation, lint is the major test
focus.~~
- ~~lint, format and test passed on local.~~
This PR includes some minor alignment updates, including:
- metadata object extended to support contractAddress, blockchainType,
and tokenId
- notebook doc better aligned to standard langchain format
- startToken changed from int to str to support multiple hex value types
on the Alchemy API
The updated metadata will look like the below. It's possible for a
single contractAddress to exist across multiple blockchains (e.g.
Ethereum, Polygon, etc.) so it's important to include the
blockchainType.
```
metadata = {"source": self.contract_address,
"blockchain": self.blockchainType,
"tokenId": tokenId}
```
This **partially** addresses
https://github.com/hwchase17/langchain/issues/1524, but it's also useful
for some of our use cases.
This `DocstoreFn` allows to lookup a document given a function that
accepts the `search` string without the need to implement a custom
`Docstore`.
This could be useful when:
* you don't want to implement a `Docstore` just to provide a custom
`search`
* it's expensive to construct an `InMemoryDocstore`/dict
* you retrieve documents from remote sources
* you just want to reuse existing objects
Add other File Utilities, include
- List Directory
- Search for file
- Move
- Copy
- Remove file
Bundle as toolkit
Add a notebook that connects to the Chat Agent, which somewhat supports
multi-arg input tools
Update original read/write files to return the original dir paths and
better handle unsupported file paths.
Add unit tests
I think the logic of
https://github.com/hwchase17/langchain/pull/3684#pullrequestreview-1405358565
is too confusing.
I prefer this alternative because:
- All `Tool()` implementations by default will be treated the same as
before. No breaking changes.
- Less reliance on pydantic magic
- The decorator (which only is typed as returning a callable) can infer
schema and generate a structured tool
- Either way, the recommended way to create a custom tool is through
inheriting from the base tool
Tradeoffs here:
- No lint-time checking for compatibility
- Differs from JS package
- The signature inference, etc. in the base tool isn't simple
- The `args_schema` is optional
Pros:
- Forwards compatibility retained
- Doesn't break backwards compatibility
- User doesn't have to think about which class to subclass (single base
tool or dynamic `Tool` interface regardless of input)
- No need to change the load_tools, etc. interfaces
Co-authored-by: Hasan Patel <mangafield@gmail.com>
This catches the warning raised when using duckdb, asserts that it's as expected.
The goal is to resolve all existing warnings to make unit-testing much stricter.
This PR introduces a Blob data type and a Blob loader interface.
This is the first of a sequence of PRs that follows this proposal:
https://github.com/hwchase17/langchain/pull/2833
The primary goals of these abstraction are:
* Decouple content loading from content parsing code.
* Help duplicated content loading code from document loaders.
* Make lazy loading a default for langchain.
This PR
* Adds `clear` method for `BaseCache` and implements it for various
caches
* Adds the default `init_func=None` and fixes gptcache integtest
* Since right now integtest is not running in CI, I've verified the
changes by running `docs/modules/models/llms/examples/llm_caching.ipynb`
(until proper e2e integtest is done in CI)
## Background
fixes#2695
## Changes
The `add_text` method uses the internal embedding function if one was
passes to the `Weaviate` constructor.
NOTE: the latest merge on the `Weaviate` class made the specification of
a `weaviate_api_key` mandatory which might not be desirable for all
users and connection methods (for example weaviate also support Embedded
Weaviate which I am happy to add support to here if people think it's
desirable). I wrapped the fetching of the api key into a try catch in
order to allow the `weaviate_api_key` to be unspecified. Do let me know
if this is unsatisfactory.
## Test Plan
added test for `add_texts` method.
It makes sense to use `arxiv` as another source of the documents for
downloading.
- Added the `arxiv` document_loader, based on the
`utilities/arxiv.py:ArxivAPIWrapper`
- added tests
- added an example notebook
- sorted `__all__` in `__init__.py` (otherwise it is hard to find a
class in the very long list)
Tools for Bing, DDG and Google weren't consistent even though the
underlying implementations were.
All three services now have the same tools and implementations to easily
switch and experiment when building chains.
This commit adds a new unit test for the _merge_splits function in the
text splitter. The new test verifies that the function merges text into
chunks of the correct size and overlap, using a specified separator. The
test passes on the current implementation of the function.
Test for #3434 @eavanvalkenburg
Initially, I was unaware and had submitted a pull request #3450 for the
same purpose, but I have now repurposed the one I used for that. And it
worked.
Fix for: [Changed regex to cover new line before action
serious.](https://github.com/hwchase17/langchain/issues/3365)
---
This PR fixes the issue where `ValueError: Could not parse LLM output:`
was thrown on seems to be valid input.
Changed regex to cover new lines before action serious (after the
keywords "Action:" and "Action Input:").
regex101: https://regex101.com/r/CXl1kB/1
---------
Co-authored-by: msarskus <msarskus@cisco.com>
### Background
Continuing to implement all the interface methods defined by the
`VectorStore` class. This PR pertains to implementation of the
`max_marginal_relevance_search_by_vector` method.
### Changes
- a `max_marginal_relevance_search_by_vector` method implementation has
been added in `weaviate.py`
- tests have been added to the the new method
- vcr cassettes have been added for the weaviate tests
### Test Plan
Added tests for the `max_marginal_relevance_search_by_vector`
implementation
### Change Safety
- [x] I have added tests to cover my changes
- Proactively raise error if a tool subclasses BaseTool, defines its
own schema, but fails to add the type-hints
- fix the auto-inferred schema of the decorator to strip the
unneeded virtual kwargs from the schema dict
Helps avoid silent instances of #3297
Improvements
* set default num_workers for ingestion to 0
* upgraded notebooks for avoiding dataset creation ambiguity
* added `force_delete_dataset_by_path`
* bumped deeplake to 3.3.0
* creds arg passing to deeplake object that would allow custom S3
Notes
* please double check if poetry is not messed up (thanks!)
Asks
* Would be great to create a shared slack channel for quick questions
---------
Co-authored-by: Davit Buniatyan <d@activeloop.ai>
This PR addresses several improvements:
- Previously it was not possible to load spaces of more than 100 pages.
The `limit` was being used both as an overall page limit *and* as a per
request pagination limit. This, in combination with the fact that
atlassian seem to use a server-side hard limit of 100 when page content
is expanded, meant it wasn't possible to download >100 pages. Now
`limit` is used *only* as a per-request pagination limit and `max_pages`
is introduced as the way to limit the total number of pages returned by
the paginator.
- Document metadata now includes `source` (the source url), making it
compatible with `RetrievalQAWithSourcesChain`.
- It is now possible to include inline and footer comments.
- It is now possible to pass `verify_ssl=False` and other parameters to
the confluence object for use cases that require it.
Hi there!
I'm excited to open this PR to add support for using a fully Postgres
syntax compatible database 'AnalyticDB' as a vector.
As AnalyticDB has been proved can be used with AutoGPT,
ChatGPT-Retrieve-Plugin, and LLama-Index, I think it is also good for
you.
AnalyticDB is a distributed Alibaba Cloud-Native vector database. It
works better when data comes to large scale. The PR includes:
- [x] A new memory: AnalyticDBVector
- [x] A suite of integration tests verifies the AnalyticDB integration
I have read your [contributing
guidelines](72b7d76d79/.github/CONTRIBUTING.md).
And I have passed the tests below
- [x] make format
- [x] make lint
- [x] make coverage
- [x] make test
### Description
Add Support for Lucene Filter. When you specify a Lucene filter for a
k-NN search, the Lucene algorithm decides whether to perform an exact
k-NN search with pre-filtering or an approximate search with modified
post-filtering. This filter is supported only for approximate search
with the indexes that are created using `lucene` engine.
OpenSearch Documentation -
https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#lucene-k-nn-filter-implementation
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>