Commit Graph

2507 Commits

Author SHA1 Message Date
Harrison Chase
94c82a189d
bump to 202 (#6262) 2023-06-16 06:52:36 -07:00
hp0404
b01cf0dd54
ArxivAPIWrapper - doc_content_chars_max (#6063)
This PR refactors the ArxivAPIWrapper class making
`doc_content_chars_max` parameter optional. Additionally, tests have
been added to ensure the functionality of the doc_content_chars_max
parameter.

Fixes #6027 (issue)
2023-06-15 22:16:42 -07:00
Daniel King
a9b97aa6f4
Update output format of MosaicML endpoint to be more flexible (#6060)
There will likely be another change or two coming over the next couple
weeks as we stabilize the API, but putting this one in now which just
makes the integration a bit more flexible with the response output
format.

```
(langchain) danielking@MML-1B940F4333E2 langchain % pytest tests/integration_tests/llms/test_mosaicml.py tests/integration_tests/embeddings/test_mosaicml.py 
=================================================================================== test session starts ===================================================================================
platform darwin -- Python 3.10.11, pytest-7.3.1, pluggy-1.0.0
rootdir: /Users/danielking/github/langchain
configfile: pyproject.toml
plugins: asyncio-0.20.3, mock-3.10.0, dotenv-0.5.2, cov-4.0.0, anyio-3.6.2
asyncio: mode=strict
collected 12 items                                                                                                                                                                        

tests/integration_tests/llms/test_mosaicml.py ......                                                                                                                                [ 50%]
tests/integration_tests/embeddings/test_mosaicml.py ......                                                                                                                          [100%]

=================================================================================== slowest 5 durations ===================================================================================
4.76s call     tests/integration_tests/llms/test_mosaicml.py::test_retry_logic
4.74s call     tests/integration_tests/llms/test_mosaicml.py::test_mosaicml_llm_call
4.13s call     tests/integration_tests/llms/test_mosaicml.py::test_instruct_prompt
0.91s call     tests/integration_tests/llms/test_mosaicml.py::test_short_retry_does_not_loop
0.66s call     tests/integration_tests/llms/test_mosaicml.py::test_mosaicml_extra_kwargs
=================================================================================== 12 passed in 19.70s ===================================================================================
```

#### Who can review?

  @hwchase17
  @dev2049
2023-06-15 22:15:39 -07:00
JaysonAlbert
50d9c7d5a4
Fix: change the chatgpt plugin retriever metadata format (#5920)
the current implement put the doc itself as the metadata, but the
document chatgpt plugin retriever returned already has a `metadata`
field, it's better to use that instead.

the original code will throw the following exception when using
`RetrievalQAWithSourcesChain`, becuse it can not find the field
`metadata`:

```python
Exception has occurred: ValueError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
Document prompt requires documents to have metadata variables: ['source']. Received document with missing metadata: ['source'].
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 27, in format_document
    raise ValueError(
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 65, in <listcomp>
    doc_strings = [format_document(doc, self.document_prompt) for doc in docs]
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 65, in _get_inputs
    doc_strings = [format_document(doc, self.document_prompt) for doc in docs]
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/stuff.py", line 85, in combine_docs
    inputs = self._get_inputs(docs, **kwargs)
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/combine_documents/base.py", line 84, in _call
    output, extra_return_dict = self.combine_docs(
  File "/home/wangjie/anaconda3/envs/chatglm/lib/python3.10/site-packages/langchain/chains/base.py", line 140, in __call__
    raise e
```

Additionally, the `metadata` filed in the `chatgpt plugin retriever`
have these fileds by default:
```json
{
    "source":  "file",   //email, file or chat
    "source_id": "filename.docx", // the filename
    "url": "", 
    ...
}
```
so, we should set `source_id` to `source` in the langchain metadata.

```python
metadata = d.pop("metadata", d)
if(metadata.get("source_id")):
    metadata["source"] = metadata.pop("source_id")
```

#### Who can review?
@dev2049

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->

---------

Co-authored-by: wangjie <wangjie@htffund.com>
2023-06-15 22:04:45 -07:00
Harrison Chase
e67b26eee9
Harrison/openai functions (#6261)
Co-authored-by: Francisco Ingham <24279597+fpingham@users.noreply.github.com>
2023-06-15 21:54:39 -07:00
Harrison Chase
6aafb46807
Harrison/openai functions (#6223)
Co-authored-by: Francisco Ingham <24279597+fpingham@users.noreply.github.com>
2023-06-15 21:43:33 -07:00
Zander Chase
bc9b8c8239
Improve Error Message for failed callback (#6247)
Include the handler class name in the warning
2023-06-15 19:18:37 -07:00
Alon Roth
0013256e81
Support chat history persistence in AutoGPT (#5716)
**Short Description**
Added a new argument to AutoGPT class which allows to persist the chat
history to a file.

**Changes**
1. Removed the `self.full_message_history: List[BaseMessage] = []`
2. Replaced it with `chat_history_memory` which can take any subclasses
of `BaseChatMessageHistory`

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-15 17:49:03 -07:00
Martin Antos
1913320cbe
Feature/add acreom loader (#5780)
adding new loader for [acreom](https://acreom.com) vaults. It's based on
the Obsidian loader with some additional text processing for acreom
specific markdown elements.

 @eyurtsev please take a look!

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-06-15 11:53:00 -07:00
Zander Chase
ae76e473e1
Add Tags for LLMs (#6229)
- [x] Add tracing tags to LLMs + Chat Models (both inheritable and
local)
- [x] Add tags for the run_on_dataset helper function(s)
2023-06-15 11:24:11 -07:00
Harrison Chase
8e1a7a8646
bump version to 201 (#6233) 2023-06-15 08:28:47 -07:00
Harrison Chase
e82687ddf4
Harrison/use functions agent (#6185)
Co-authored-by: Francisco Ingham <24279597+fpingham@users.noreply.github.com>
2023-06-15 08:18:50 -07:00
Ryo Kanazawa
7d2b946d0b
Fix typo pandocs to pandoc (#6203)
Fixes https://github.com/hwchase17/langchain/issues/6204

### Context

An typo issue with `pandoc`.

#### Who can review?
@hwchase17
2023-06-15 08:18:27 -07:00
Kyle Roth
c7db9febb0
count tokens for new OpenAI model versions (#6195)
Trying to call `ChatOpenAI.get_num_tokens_from_messages` returns the
following error for the newly announced models `gpt-3.5-turbo-0613` and
`gpt-4-0613`:

```
NotImplementedError: get_num_tokens_from_messages() is not presently implemented for model gpt-3.5-turbo-0613.See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.
```

This adds support for counting tokens for those models, by counting
tokens the same way they're counted for the previous versions of
`gpt-3.5-turbo` and `gpt-4`.

#### reviewers

  - @hwchase17
  - @agola11
2023-06-15 06:16:03 -07:00
xu0o0
7ad13cdbdb
feat: add content_format param to ConfluenceLoader.load() (#5922)
Confluence API supports difference format of page content. The storage
format is the raw XML representation for storage. The view format is the
HTML representation for viewing with macros rendered as though it is
viewed by users.

Add the `content_format` parameter to `ConfluenceLoader.load()` to
specify the content format, this is
set to `ContentFormat.STORAGE` by default.

#### Who can review?

Tag maintainers/contributors who might be interested: @eyurtsev

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-14 16:56:28 -07:00
0xJordan
c5a46e7435
feat: Add support for the Solidity language (#6054)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

## Add Solidity programming language support for code splitter.

Twitter: @0xjord4n_

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
#### Who can review?

Tag maintainers/contributors who might be interested:
@hwchase17

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-14 14:25:02 -07:00
Nuno Campos
17c4ec4812
Add docs for tags (#6155)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-14 14:01:58 -07:00
thiswillbeyourgithub
4a649e3b14
typo: 'following following' to 'following' (#6163)
Co-authored-by: thiswillbeyourgithub <github@32mail.33mail.com>
2023-06-14 10:58:47 -07:00
Maciej Bryński
8a44c879c6
Update readthedocs_documentation.ipynb (#6148)
Minor fix in documentation. 
Change URL in wget call to proper one.
2023-06-14 07:21:48 -07:00
Zander Chase
e0e3ef1c57
Update Name (#6136) 2023-06-13 22:25:36 -07:00
Zander Chase
4555ad5d1f
Add Run Collector Callback (#6133)
Add a callback handler that can collect nested run objects. Useful for
evaluation.
2023-06-13 22:17:37 -07:00
Harrison Chase
6ac120f299
bump ver to 200 (#6130) 2023-06-13 19:33:51 -07:00
Harrison Chase
e41f0b341c
add functions agent (#6113) 2023-06-13 18:51:01 -07:00
Zander Chase
b3b155d488
Return session name in runner response (#6112)
Makes it easier to then run evals w/o thinking about specifying a
session
2023-06-13 16:59:43 -07:00
Harrison Chase
e74733ab9e
support streaming for functions (#6115) 2023-06-13 15:26:26 -07:00
Nuno Campos
11ab0be11a
Add support for tags (#5898)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-13 12:30:59 -07:00
Harrison Chase
1281fdf0f2
Harrison/notebook functions (#6103) 2023-06-13 10:52:54 -07:00
Harrison Chase
34ebb29726
bump version to 199 (#6102) 2023-06-13 10:50:33 -07:00
Wenchen Li
f9edf76e7c
Implement max_marginal_relevance_search in VectorStore of Pinecone (#6056)
This adds implementation of MMR search in pinecone; and I have two
semi-related observations about this vector store class:
- Maybe we should also have a
`similarity_search_by_vector_returning_embeddings` like in supabase, but
it's not in the base `VectorStore` class so I didn't implement
- Talking about the base class, there's
`similarity_search_with_relevance_scores`, but in pinecone it is called
`similarity_search_with_score`; maybe we should consider renaming it to
align with other `VectorStore` base and sub classes (or add that as an
alias for backward compatibility)

#### Who can review?

Tag maintainers/contributors who might be interested:
 - VectorStores / Retrievers / Memory - @dev2049
2023-06-13 10:46:45 -07:00
Harrison Chase
970b2f9d38
convert tools to openai (#6100) 2023-06-13 10:40:49 -07:00
Harrison Chase
292accde2b
support functions (#6099) 2023-06-13 10:32:58 -07:00
Lance Martin
ee3d0513ad
Add tests and update notebook for MarkdownHeaderTextSplitter (#6069)
Add test and update notebook for `MarkdownHeaderTextSplitter`.
2023-06-13 09:07:52 -07:00
Keshav Kumar
8fdf88b8e3
Fix for ModuleNotFoundError while running langchain-server. Issue #5833 (#6077)
This PR fixes the error
`ModuleNotFoundError: No module named 'langchain.cli'`
Fixes https://github.com/hwchase17/langchain/issues/5833 (issue)
2023-06-13 08:37:07 -07:00
Zander Chase
0c52275bdb
Use Run object from SDK (#6067)
Update the Run object in the tracer to extend that in the SDK to include
the parameters necessary for tracking/tracing
2023-06-13 07:14:11 -07:00
Harrison Chase
cde1e8739a
turn off repr (#6078) 2023-06-12 22:45:24 -07:00
Nuno Campos
a9b3b2e327
Enable serialization for anthropic (#6049) 2023-06-12 22:39:10 -07:00
Harrison Chase
6ac5d80286
propogate kwargs fully (#6076) 2023-06-12 22:37:55 -07:00
Harrison Chase
ec1a2adf9c
improve tools (#6062) 2023-06-12 22:19:03 -07:00
Julius Lipp
5b6bbf4ab2
Add embaas document extraction api endpoints (#6048)
# Introduces embaas document extraction api endpoints

In this PR, we add support for embaas document extraction endpoints to
Text Embedding Models (with LLMs, in different PRs coming). We currently
offer the MTEB leaderboard top performers, will continue to add top
embedding models and soon add support for customers to deploy thier own
models. Additional Documentation + Infomation can be found
[here](https://embaas.io).

While developing this integration, I closely followed the patterns
established by other langchain integrations. Nonetheless, if there are
any aspects that require adjustments or if there's a better way to
present a new integration, let me know! :)

Additionally, I fixed some docs in the embeddings integration.

Related PR: #5976 

#### Who can review?
  DataLoaders
  - @eyurtsev
2023-06-12 19:13:52 -07:00
Zander Chase
2f0088039d
Log tracer errors (#6066)
Example (would log several times if not for the helper fn. Would emit no
logs due to mulithreading previously)

![image](https://github.com/hwchase17/langchain/assets/130414180/070d25ae-1f06-4487-9617-0a6f66f3f01e)
2023-06-12 17:13:49 -07:00
Lance Martin
b023f0c0f2
Text splitter for Markdown files by header (#5860)
This creates a new kind of text splitter for markdown files.

The user can supply a set of headers that they want to split the file
on.

We define a new text splitter class, `MarkdownHeaderTextSplitter`, that
does a few things:

(1) For each line, it determines the associated set of user-specified
headers
(2) It groups lines with common headers into splits

See notebook for example usage and test cases.
2023-06-12 15:46:42 -07:00
Jens Madsen
2c91f0d750
chore: spedd up integration test by using smaller model (#6044)
Adds a new parameter `relative_chunk_overlap` for the
`SentenceTransformersTokenTextSplitter` constructor. The parameter sets
the chunk overlap using a relative factor, e.g. for a model where the
token limit is 100, a `relative_chunk_overlap=0.5` implies that
`chunk_overlap=50`

Tag maintainers/contributors who might be interested:

 @hwchase17, @dev2049
2023-06-12 13:27:10 -07:00
Harrison Chase
5922742d56 comment out 2023-06-12 10:57:31 -07:00
Harrison Chase
681ba6d520 embaas title 2023-06-12 08:00:14 -07:00
Ben Flast
7a5e36f3f5
Mongo db doc fix (#6042)
I missed a few errors in my initial fix @hwchase1.  Thanks!
2023-06-12 07:29:27 -07:00
Harrison Chase
289e9aeb9d
bump ver to 198 (#6026) 2023-06-11 21:32:45 -07:00
Harrison Chase
d1561b74eb
Harrison/cognitive search (#6011)
Co-authored-by: Fabrizio Ruocco <ruoccofabrizio@gmail.com>
2023-06-11 21:15:42 -07:00
wenmeng zhou
bb7ac9edb5
add dashscope text embedding (#5929)
#### What I do
Adding embedding api for
[DashScope](https://help.aliyun.com/product/610100.html), which is the
DAMO Academy's multilingual text unified vector model based on the LLM
base. It caters to multiple mainstream languages worldwide and offers
high-quality vector services, helping developers quickly transform text
data into high-quality vector data. Currently supported languages
include Chinese, English, Spanish, French, Portuguese, Indonesian, and
more.

#### Who can review?

  Models
  - @hwchase17
  - @agola11

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-11 21:14:20 -07:00
Ben Flast
010d0bfeea
Update MongoDB Atlas support docs (#6022)
Updating MongoDB Atlas support docs @hwchase17 let me know if you have
any questions
2023-06-11 20:57:15 -07:00
Harrison Chase
e05997c25e
Harrison/hologres (#6012)
Co-authored-by: Changgeng Zhao <changgeng@nyu.edu>
Co-authored-by: Changgeng Zhao <zhaochanggeng.zcg@alibaba-inc.com>
2023-06-11 20:56:51 -07:00