Hello
1) Passing `embedding_function` as a callable seems to be outdated and
the common interface is to pass `Embeddings` instance
2) At the moment `Qdrant.add_texts` is designed to be used with
`embeddings.embed_query`, which is 1) slow 2) causes ambiguity due to 1.
It should be used with `embeddings.embed_documents`
This PR solves both problems and also provides some new tests
- Update the RunCreate object to work with recent changes
- Add optional Example ID to the tracer
- Adjust default persist_session behavior to attempt to load the session
if it exists
- Raise more useful HTTP errors for logging
- Add unit testing
- Fix the default ID to be a UUID for v2 tracer sessions
Broken out from the big draft here:
https://github.com/hwchase17/langchain/pull/4061
- confirm creation
- confirm functionality with a simple dimension check.
The test now is calling OpenAI API directly, but learning from
@vowelparrot that we’re caching the requests, so that it’s not that
expensive. I also found we’re calling OpenAI api in other integration
tests. Please lmk if there is any concern of real external API calls. I
can alternatively make a fake LLM for this test. Thanks
This implements a loader of text passages in JSON format. The `jq`
syntax is used to define a schema for accessing the relevant contents
from the JSON file. This requires dependency on the `jq` package:
https://pypi.org/project/jq/.
---------
Signed-off-by: Aivin V. Solatorio <avsolatorio@gmail.com>
This PR updates the `message_line_regex` used by `WhatsAppChatLoader` to
support different date-time formats used in WhatsApp chat exports;
resolves#4153.
The new regex handles the following input formats:
```terminal
[05.05.23, 15:48:11] James: Hi here
[11/8/21, 9:41:32 AM] User name: Message 123
1/23/23, 3:19 AM - User 2: Bye!
1/23/23, 3:22_AM - User 1: And let me know if anything changes
```
Tests have been added to verify that the loader works correctly with all
formats.
The forward ref annotations don't get updated if we only iimport with
type checking
---------
Co-authored-by: Abhinav Verma <abhinav_win12@yahoo.co.in>
* implemented arun, results, and aresults. Reuses aiosession if
available.
* helper tools GoogleSerperRun and GoogleSerperResults
* support for Google Images, Places and News (examples given) and
filtering based on time (e.g. past hour)
* updated docs
This PR includes two main changes:
- Refactor the `TelegramChatLoader` and `FacebookChatLoader` classes by
removing the dependency on pandas and simplifying the message filtering
process.
- Add test cases for the `TelegramChatLoader` and `FacebookChatLoader`
classes. This test ensures that the class correctly loads and processes
the example chat data, providing better test coverage for this
functionality.
The Blockchain Document Loader's default behavior is to return 100
tokens at a time which is the Alchemy API limit. The Document Loader
exposes a startToken that can be used for pagination against the API.
This enhancement includes an optional get_all_tokens param (default:
False) which will:
- Iterate over the Alchemy API until it receives all the tokens, and
return the tokens in a single call to the loader.
- Manage all/most tokenId formats (this can be int, hex16 with zero or
all the leading zeros). There aren't constraints as to how smart
contracts can represent this value, but these three are most common.
Note that a contract with 10,000 tokens will issue 100 calls to the
Alchemy API, and could take about a minute, which is why this param will
default to False. But I've been using the doc loader with these
utilities on the side, so figured it might make sense to build them in
for others to use.
Seems the pyllamacpp package is no longer the supported bindings from
gpt4all. Tested that this works locally.
Given that the older models weren't very performant, I think it's better
to migrate now without trying to include a lot of try / except blocks
---------
Co-authored-by: Nissan Pow <npow@users.noreply.github.com>
Co-authored-by: Nissan Pow <pownissa@amazon.com>
- ActionAgent has a property called, `allowed_tools`, which is declared
as `List`. It stores all provided tools which is available to use during
agent action.
- This collection shouldn’t allow duplicates. The original datatype List
doesn’t make sense. Each tool should be unique. Even when there are
variants (assuming in the future), it would be named differently in
load_tools.
Test:
- confirm the functionality in an example by initializing an agent with
a list of 2 tools and confirm everything works.
```python3
def test_agent_chain_chat_bot():
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.agents import AgentType
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain.utilities.duckduckgo_search import DuckDuckGoSearchAPIWrapper
chat = ChatOpenAI(temperature=0)
llm = OpenAI(temperature=0)
tools = load_tools(["ddg-search", "llm-math"], llm=llm)
agent = initialize_agent(tools, chat, agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION, verbose=True)
agent.run("Who is Olivia Wilde's boyfriend? What is his current age raised to the 0.23 power?")
test_agent_chain_chat_bot()
```
Result:
<img width="863" alt="Screenshot 2023-05-01 at 7 58 11 PM"
src="https://user-images.githubusercontent.com/62768671/235572157-0937594c-ddfb-4760-acb2-aea4cacacd89.png">
Modified Modern Treasury and Strip slightly so credentials don't have to
be passed in explicitly. Thanks @mattgmarcus for adding Modern Treasury!
---------
Co-authored-by: Matt Marcus <matt.g.marcus@gmail.com>
Haven't gotten to all of them, but this:
- Updates some of the tools notebooks to actually instantiate a tool
(many just show a 'utility' rather than a tool. More changes to come in
separate PR)
- Move the `Tool` and decorator definitions to `langchain/tools/base.py`
(but still export from `langchain.agents`)
- Add scene explain to the load_tools() function
- Add unit tests for public apis for the langchain.tools and langchain.agents modules
Move tool validation to each implementation of the Agent.
Another alternative would be to adjust the `_validate_tools()` signature
to accept the output parser (and format instructions) and add logic
there. Something like
`parser.outputs_structured_actions(format_instructions)`
But don't think that's needed right now.
- Add langchain.llms.GooglePalm for text completion,
- Add langchain.chat_models.ChatGooglePalm for chat completion,
- Add langchain.embeddings.GooglePalmEmbeddings for sentence embeddings,
- Add example field to HumanMessage and AIMessage so that users can feed
in examples into the PaLM Chat API,
- Add system and unit tests.
Note async completion for the Text API is not yet supported and will be
included in a future PR.
Happy for feedback on any aspect of this PR, especially our choice of
adding an example field to Human and AI Message objects to enable
passing example messages to the API.
This pull request adds unit tests for various output parsers
(BooleanOutputParser, CommaSeparatedListOutputParser, and
StructuredOutputParser) to ensure their correct functionality and to
increase code reliability and maintainability. The tests cover both
valid and invalid input cases.
Changes:
Added unit tests for BooleanOutputParser.
Added unit tests for CommaSeparatedListOutputParser.
Added unit tests for StructuredOutputParser.
Testing:
All new unit tests have been executed, and they pass successfully.
The overall test suite has been run, and all tests pass.
Notes:
These tests cover both successful parsing scenarios and error handling
for invalid inputs.
If any new output parsers are added in the future, corresponding unit
tests should also be created to maintain coverage.
Enum to string conversion handled differently between python 3.9 and
3.11, currently breaking in 3.11 (see #3788). Thanks @peter-brady for
catching this!
In the current solution, AgentType and AGENT_TO_CLASS are placed in two
separate files and both manually maintained. This might cause
inconsistency when we update either of them.
— latest —
based on the discussion with hwchase17, we don’t know how to further use
the newly introduced AgentTypeConfig type, so it doesn’t make sense yet
to add it. Instead, it’s better to move the dictionary to another file
to keep the loading.py file clear. The consistency is a good point.
Instead of asserting the consistency during linting, we added a unittest
for consistency check. I think it works as auto unittest is triggered
every time with clear failure notice. (well, force push is possible, but
we all know what we are doing, so let’s show trust. :>)
~~This PR includes~~
- ~~Introduced AgentTypeConfig as the source of truth of all AgentType
related meta data.~~
- ~~Each AgentTypeConfig is a annotated class type which can be used for
annotation in other places.~~
- ~~Each AgentTypeConfig can be easily extended when we have more meta
data needs.~~
- ~~Strong assertion to ensure AgentType and AGENT_TO_CLASS are always
consistent.~~
- ~~Made AGENT_TO_CLASS automatically generated.~~
~~Test Plan:~~
- ~~since this change is focusing on annotation, lint is the major test
focus.~~
- ~~lint, format and test passed on local.~~
This PR includes some minor alignment updates, including:
- metadata object extended to support contractAddress, blockchainType,
and tokenId
- notebook doc better aligned to standard langchain format
- startToken changed from int to str to support multiple hex value types
on the Alchemy API
The updated metadata will look like the below. It's possible for a
single contractAddress to exist across multiple blockchains (e.g.
Ethereum, Polygon, etc.) so it's important to include the
blockchainType.
```
metadata = {"source": self.contract_address,
"blockchain": self.blockchainType,
"tokenId": tokenId}
```
This **partially** addresses
https://github.com/hwchase17/langchain/issues/1524, but it's also useful
for some of our use cases.
This `DocstoreFn` allows to lookup a document given a function that
accepts the `search` string without the need to implement a custom
`Docstore`.
This could be useful when:
* you don't want to implement a `Docstore` just to provide a custom
`search`
* it's expensive to construct an `InMemoryDocstore`/dict
* you retrieve documents from remote sources
* you just want to reuse existing objects
Add other File Utilities, include
- List Directory
- Search for file
- Move
- Copy
- Remove file
Bundle as toolkit
Add a notebook that connects to the Chat Agent, which somewhat supports
multi-arg input tools
Update original read/write files to return the original dir paths and
better handle unsupported file paths.
Add unit tests
I think the logic of
https://github.com/hwchase17/langchain/pull/3684#pullrequestreview-1405358565
is too confusing.
I prefer this alternative because:
- All `Tool()` implementations by default will be treated the same as
before. No breaking changes.
- Less reliance on pydantic magic
- The decorator (which only is typed as returning a callable) can infer
schema and generate a structured tool
- Either way, the recommended way to create a custom tool is through
inheriting from the base tool
Tradeoffs here:
- No lint-time checking for compatibility
- Differs from JS package
- The signature inference, etc. in the base tool isn't simple
- The `args_schema` is optional
Pros:
- Forwards compatibility retained
- Doesn't break backwards compatibility
- User doesn't have to think about which class to subclass (single base
tool or dynamic `Tool` interface regardless of input)
- No need to change the load_tools, etc. interfaces
Co-authored-by: Hasan Patel <mangafield@gmail.com>
This catches the warning raised when using duckdb, asserts that it's as expected.
The goal is to resolve all existing warnings to make unit-testing much stricter.
This PR introduces a Blob data type and a Blob loader interface.
This is the first of a sequence of PRs that follows this proposal:
https://github.com/hwchase17/langchain/pull/2833
The primary goals of these abstraction are:
* Decouple content loading from content parsing code.
* Help duplicated content loading code from document loaders.
* Make lazy loading a default for langchain.
This PR
* Adds `clear` method for `BaseCache` and implements it for various
caches
* Adds the default `init_func=None` and fixes gptcache integtest
* Since right now integtest is not running in CI, I've verified the
changes by running `docs/modules/models/llms/examples/llm_caching.ipynb`
(until proper e2e integtest is done in CI)
## Background
fixes#2695
## Changes
The `add_text` method uses the internal embedding function if one was
passes to the `Weaviate` constructor.
NOTE: the latest merge on the `Weaviate` class made the specification of
a `weaviate_api_key` mandatory which might not be desirable for all
users and connection methods (for example weaviate also support Embedded
Weaviate which I am happy to add support to here if people think it's
desirable). I wrapped the fetching of the api key into a try catch in
order to allow the `weaviate_api_key` to be unspecified. Do let me know
if this is unsatisfactory.
## Test Plan
added test for `add_texts` method.
It makes sense to use `arxiv` as another source of the documents for
downloading.
- Added the `arxiv` document_loader, based on the
`utilities/arxiv.py:ArxivAPIWrapper`
- added tests
- added an example notebook
- sorted `__all__` in `__init__.py` (otherwise it is hard to find a
class in the very long list)
Tools for Bing, DDG and Google weren't consistent even though the
underlying implementations were.
All three services now have the same tools and implementations to easily
switch and experiment when building chains.
This commit adds a new unit test for the _merge_splits function in the
text splitter. The new test verifies that the function merges text into
chunks of the correct size and overlap, using a specified separator. The
test passes on the current implementation of the function.
Test for #3434 @eavanvalkenburg
Initially, I was unaware and had submitted a pull request #3450 for the
same purpose, but I have now repurposed the one I used for that. And it
worked.
Fix for: [Changed regex to cover new line before action
serious.](https://github.com/hwchase17/langchain/issues/3365)
---
This PR fixes the issue where `ValueError: Could not parse LLM output:`
was thrown on seems to be valid input.
Changed regex to cover new lines before action serious (after the
keywords "Action:" and "Action Input:").
regex101: https://regex101.com/r/CXl1kB/1
---------
Co-authored-by: msarskus <msarskus@cisco.com>
### Background
Continuing to implement all the interface methods defined by the
`VectorStore` class. This PR pertains to implementation of the
`max_marginal_relevance_search_by_vector` method.
### Changes
- a `max_marginal_relevance_search_by_vector` method implementation has
been added in `weaviate.py`
- tests have been added to the the new method
- vcr cassettes have been added for the weaviate tests
### Test Plan
Added tests for the `max_marginal_relevance_search_by_vector`
implementation
### Change Safety
- [x] I have added tests to cover my changes
- Proactively raise error if a tool subclasses BaseTool, defines its
own schema, but fails to add the type-hints
- fix the auto-inferred schema of the decorator to strip the
unneeded virtual kwargs from the schema dict
Helps avoid silent instances of #3297
Improvements
* set default num_workers for ingestion to 0
* upgraded notebooks for avoiding dataset creation ambiguity
* added `force_delete_dataset_by_path`
* bumped deeplake to 3.3.0
* creds arg passing to deeplake object that would allow custom S3
Notes
* please double check if poetry is not messed up (thanks!)
Asks
* Would be great to create a shared slack channel for quick questions
---------
Co-authored-by: Davit Buniatyan <d@activeloop.ai>
This PR addresses several improvements:
- Previously it was not possible to load spaces of more than 100 pages.
The `limit` was being used both as an overall page limit *and* as a per
request pagination limit. This, in combination with the fact that
atlassian seem to use a server-side hard limit of 100 when page content
is expanded, meant it wasn't possible to download >100 pages. Now
`limit` is used *only* as a per-request pagination limit and `max_pages`
is introduced as the way to limit the total number of pages returned by
the paginator.
- Document metadata now includes `source` (the source url), making it
compatible with `RetrievalQAWithSourcesChain`.
- It is now possible to include inline and footer comments.
- It is now possible to pass `verify_ssl=False` and other parameters to
the confluence object for use cases that require it.
Hi there!
I'm excited to open this PR to add support for using a fully Postgres
syntax compatible database 'AnalyticDB' as a vector.
As AnalyticDB has been proved can be used with AutoGPT,
ChatGPT-Retrieve-Plugin, and LLama-Index, I think it is also good for
you.
AnalyticDB is a distributed Alibaba Cloud-Native vector database. It
works better when data comes to large scale. The PR includes:
- [x] A new memory: AnalyticDBVector
- [x] A suite of integration tests verifies the AnalyticDB integration
I have read your [contributing
guidelines](72b7d76d79/.github/CONTRIBUTING.md).
And I have passed the tests below
- [x] make format
- [x] make lint
- [x] make coverage
- [x] make test
### Description
Add Support for Lucene Filter. When you specify a Lucene filter for a
k-NN search, the Lucene algorithm decides whether to perform an exact
k-NN search with pre-filtering or an approximate search with modified
post-filtering. This filter is supported only for approximate search
with the indexes that are created using `lucene` engine.
OpenSearch Documentation -
https://opensearch.org/docs/latest/search-plugins/knn/filter-search-knn/#lucene-k-nn-filter-implementation
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
- Permit the specification of a `root_dir` to the read/write file tools
to specify a working directory
- Add validation for attempts to read/write outside the directory (e.g.,
through `../../` or symlinks or `/abs/path`'s that don't lie in the
correct path)
- Add some tests for all
One question is whether we should make a default root directory for
these? tradeoffs either way
`langchain.prompts.PromptTemplate` and
`langchain.prompts.FewShotPromptTemplate` do not validate
`input_variables` when initialized as `jinja2` template.
```python
# Using langchain v0.0.144
template = """"\
Your variable: {{ foo }}
{% if bar %}
You just set bar boolean variable to true
{% endif %}
"""
# Missing variable, should raise ValueError
prompt_template = PromptTemplate(template=template,
input_variables=["bar"],
template_format="jinja2",
validate_template=True)
# Extra variable, should raise ValueError
prompt_template = PromptTemplate(template=template,
input_variables=["bar", "foo", "extra", "thing"],
template_format="jinja2",
validate_template=True)
```
- Remove dynamic model creation in the `args()` property. _Only infer
for the decorator (and add an argument to NOT infer if someone wishes to
only pass as a string)_
- Update the validation example to make it less likely to be
misinterpreted as a "safe" way to run a repl
There is one example of "Multi-argument tools" in the custom_tools.ipynb
from yesterday, but we could add more. The output parsing for the base
MRKL agent hasn't been adapted to handle structured args at this point
in time
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
`langchain.prompts.PromptTemplate` is unable to infer `input_variables`
from jinja2 template.
```python
# Using langchain v0.0.141
template_string = """\
Hello world
Your variable: {{ var }}
{# This will not get rendered #}
{% if verbose %}
Congrats! You just turned on verbose mode and got extra messages!
{% endif %}
"""
template = PromptTemplate.from_template(template_string, template_format="jinja2")
print(template.input_variables) # Output ['# This will not get rendered #', '% endif %', '% if verbose %']
```
---------
Co-authored-by: engkheng <ongengkheng929@example.com>
Add a time-weighted memory retriever and a notebook that approximates a
Generative Agent from https://arxiv.org/pdf/2304.03442.pdf
The "daily plan" components are removed for now since they are less
useful without a virtual world, but the memory is an interesting
component to build off.
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
### Background
Continuing to implement all the interface methods defined by the
`VectorStore` class. This PR pertains to implementation of the
`max_marginal_relevance_search` method.
### Changes
- a `max_marginal_relevance_search` method implementation has been added
in `weaviate.py`
- tests have been added to the the new method
- vcr cassettes have been added for the weaviate tests
### Test Plan
Added tests for the `max_marginal_relevance_search` implementation
### Change Safety
- [x] I have added tests to cover my changes
Use numexpr evaluate instead of the python REPL to avoid malicious code
injection.
Tested against the (limited) math dataset and got the same score as
before.
For more permissive tools (like the REPL tool itself), other approaches
ought to be provided (some combination of Sanitizer + Restricted python
+ unprivileged-docker + ...), but for a calculator tool, only
mathematical expressions should be permitted.
See https://github.com/hwchase17/langchain/issues/814
Add a method that exposes a similarity search with corresponding
normalized similarity scores. Implement only for FAISS now.
### Motivation:
Some memory definitions combine `relevance` with other scores, like
recency , importance, etc.
While many (but not all) of the `VectorStore`'s expose a
`similarity_search_with_score` method, they don't all interpret the
units of that score (depends on the distance metric and whether or not
the the embeddings are normalized).
This PR proposes a `similarity_search_with_normalized_similarities`
method that lets consumers of the vector store not have to worry about
the metric and embedding scale.
*Most providers default to euclidean distance, with Pinecone being one
exception (defaults to cosine _similarity_).*
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Minor cosmetic changes
- Activeloop environment cred authentication in notebooks with
`getpass.getpass` (instead of CLI which not always works)
- much faster tests with Deep Lake pytest mode on
- Deep Lake kwargs pass
Notes
- I put pytest environment creds inside `vectorstores/conftest.py`, but
feel free to suggest a better location. For context, if I put in
`test_deeplake.py`, `ruff` doesn't let me to set them before import
deeplake
---------
Co-authored-by: Davit Buniatyan <d@activeloop.ai>
Note to self: Always run integration tests, even on "that last minute
change you thought would be safe" :)
---------
Co-authored-by: Mike Lambert <mike.lambert@anthropic.com>
* Adds an Anthropic ChatModel
* Factors out common code in our LLMModel and ChatModel
* Supports streaming llm-tokens to the callbacks on a delta basis (until
a future V2 API does that for us)
* Some fixes
Fixes linting issue from #2835
Adds a loader for Slack Exports which can be a very valuable source of
knowledge to use for internal QA bots and other use cases.
```py
# Export data from your Slack Workspace first.
from langchain.document_loaders import SLackDirectoryLoader
SLACK_WORKSPACE_URL = "https://awesome.slack.com"
loader = ("Slack_Exports", SLACK_WORKSPACE_URL)
docs = loader.load()
```
When the code ran by the PythonAstREPLTool contains multiple statements
it will fallback to exec() instead of using eval(). With this change, it
will also return the output of the code in the same way the
PythonREPLTool will.
Currently, the output type of a number of OutputParser's `parse` methods
is `Any` when it can in fact be inferred.
This PR makes BaseOutputParser use a generic type and fixes the output
types of the following parsers:
- `PydanticOutputParser`
- `OutputFixingParser`
- `RetryOutputParser`
- `RetryWithErrorOutputParser`
The output of the `StructuredOutputParser` is corrected from `BaseModel`
to `Any` since there are no type guarantees provided by the parser.
Fixes issue #2715
Currently, the function still fails if `continue_on_failure` is set to
True, because `elements` is not set.
---------
Co-authored-by: leecjohnny <johnny-lee1255@users.noreply.github.com>
Add more missed imports for integration tests. Bump `pytest` to the
current latest version.
Fix `tests/integration_tests/vectorstores/test_elasticsearch.py` to
update its cassette(easy fix).
Related PR: https://github.com/hwchase17/langchain/pull/2560
I've added a bilibili loader, bilibili is a very active video site in
China and I think we need this loader.
Example:
```python
from langchain.document_loaders.bilibili import BiliBiliLoader
loader = BiliBiliLoader(
["https://www.bilibili.com/video/BV1xt411o7Xu/",
"https://www.bilibili.com/video/av330407025/"]
)
docs = loader.load()
```
Co-authored-by: 了空 <568250549@qq.com>
**Description**
Add custom vector field name and text field name while indexing and
querying for OpenSearch
**Issues**
https://github.com/hwchase17/langchain/issues/2500
Signed-off-by: Naveen Tatikonda <navtat@amazon.com>
`combine_docs` does not go through the standard chain call path which
means that chain callbacks won't be triggered, meaning QA chains won't
be traced properly, this fixes that.
Also fix several errors in the chat_vector_db notebook
Adds a new pdf loader using the existing dependency on PDFMiner.
The new loader can be helpful for chunking texts semantically into
sections as the output html content can be parsed via `BeautifulSoup` to
get more structured and rich information about font size, page numbers,
pdf headers/footers, etc. which may not be available otherwise with
other pdf loaders
Almost all integration tests have failed, but we haven't encountered any
import errors yet. Some tests failed due to lazy import issues. It
doesn't seem like a problem to resolve some of these errors in the next
PR.
I have a headache from resolving conflicts with `deeplake` and `boto3`,
so I will temporarily comment out `boto3`.
fix https://github.com/hwchase17/langchain/issues/2426
Using `pytest-vcr` in integration tests has several benefits. Firstly,
it removes the need to mock external services, as VCR records and
replays HTTP interactions on the fly. Secondly, it simplifies the
integration test setup by eliminating the need to set up and tear down
external services in some cases. Finally, it allows for more reliable
and deterministic integration tests by ensuring that HTTP interactions
are always replayed with the same response.
Overall, `pytest-vcr` is a valuable tool for simplifying integration
test setup and improving their reliability
This commit adds the `pytest-vcr` package as a dependency for
integration tests in the `pyproject.toml` file. It also introduces two
new fixtures in `tests/integration_tests/conftest.py` files for managing
cassette directories and VCR configurations.
In addition, the
`tests/integration_tests/vectorstores/test_elasticsearch.py` file has
been updated to use the `@pytest.mark.vcr` decorator for recording and
replaying HTTP interactions.
Finally, this commit removes the `documents` fixture from the
`test_elasticsearch.py` file and replaces it with a new fixture defined
in `tests/integration_tests/vectorstores/conftest.py` that yields a list
of documents to use in any other tests.
This also includes my second attempt to fix issue :
https://github.com/hwchase17/langchain/issues/2386
Maybe related https://github.com/hwchase17/langchain/issues/2484
Right now, eval chains require an answer for every question. It's
cumbersome to collect this ground truth so getting around this issue
with 2 things:
* Adding a context param in `ContextQAEvalChain` and simply evaluating
if the question is answered accurately from context
* Adding chain of though explanation prompting to improve the accuracy
of this w/o GT.
This also gets to feature parity with openai/evals which has the same
contextual eval w/o GT.
TODO in follow-up:
* Better prompt inheritance. No need for seperate prompt for CoT
reasoning. How can we merge them together
---------
Co-authored-by: Vashisht Madhavan <vashishtmadhavan@Vashs-MacBook-Pro.local>
This still doesn't handle the following
- non-JSON media types
- anyOf, allOf, oneOf's
And doesn't emit the typescript definitions for referred types yet, but
that can be saved for a separate PR.
Also, we could have better support for Swagger 2.0 specs and OpenAPI
3.0.3 (can use the same lib for the latter) recommend offline conversion
for now.
`AgentExecutor` already has support for limiting the number of
iterations. But the amount of time taken for each iteration can vary
quite a bit, so it is difficult to place limits on the execution time.
This PR adds a new field `max_execution_time` to the `AgentExecutor`
model. When called asynchronously, the agent loop is wrapped in an
`asyncio.timeout()` context which triggers the early stopping response
if the time limit is reached. When called synchronously, the agent loop
checks for both the max_iteration limit and the time limit after each
iteration.
When used asynchronously `max_execution_time` gives really tight control
over the max time for an execution chain. When used synchronously, the
chain can unfortunately exceed max_execution_time, but it still gives
more control than trying to estimate the number of max_iterations needed
to cap the execution time.
---------
Co-authored-by: Zachary Jones <zjones@zetaglobal.com>
### Features include
- Metadata based embedding search
- Choice of distance metric function (`L2` for Euclidean, `L1` for
Nuclear, `max` L-infinity distance, `cos` for cosine similarity, 'dot'
for dot product. Defaults to `L2`
- Returning scores
- Max Marginal Relevance Search
- Deleting samples from the dataset
### Notes
- Added numerous tests, let me know if you would like to shorten them or
make smarter
---------
Co-authored-by: Davit Buniatyan <d@activeloop.ai>
It's useful to evaluate API Chains against a mock server. This PR makes
an example "robot" server that exposes endpoints for the following:
- Path, Query, and Request Body argument passing
- GET, PUT, and DELETE endpoints exposed OpenAPI spec.
Relies on FastAPI + Uvicorn - I could add to the dev dependencies list
if you'd like
- Create a new docker-compose file to start an Elasticsearch instance
for integration tests.
- Add new tests to `test_elasticsearch.py` to verify Elasticsearch
functionality.
- Include an optional group `test_integration` in the `pyproject.toml`
file. This group should contain dependencies for integration tests and
can be installed using the command `poetry install --with
test_integration`. Any new dependencies should be added by running
`poetry add some_new_deps --group "test_integration" `
Note:
New tests running in live mode, which involve end-to-end testing of the
OpenAI API. In the future, adding `pytest-vcr` to record and replay all
API requests would be a nice feature for testing process.More info:
https://pytest-vcr.readthedocs.io/en/latest/
Fixes https://github.com/hwchase17/langchain/issues/2386