Commit Graph

2549 Commits

Author SHA1 Message Date
Martin Antos
1913320cbe
Feature/add acreom loader (#5780)
adding new loader for [acreom](https://acreom.com) vaults. It's based on
the Obsidian loader with some additional text processing for acreom
specific markdown elements.

 @eyurtsev please take a look!

---------

Co-authored-by: rlm <pexpresss31@gmail.com>
2023-06-15 11:53:00 -07:00
Zander Chase
ae76e473e1
Add Tags for LLMs (#6229)
- [x] Add tracing tags to LLMs + Chat Models (both inheritable and
local)
- [x] Add tags for the run_on_dataset helper function(s)
2023-06-15 11:24:11 -07:00
Harrison Chase
8e1a7a8646
bump version to 201 (#6233) 2023-06-15 08:28:47 -07:00
Harrison Chase
e82687ddf4
Harrison/use functions agent (#6185)
Co-authored-by: Francisco Ingham <24279597+fpingham@users.noreply.github.com>
2023-06-15 08:18:50 -07:00
Ryo Kanazawa
7d2b946d0b
Fix typo pandocs to pandoc (#6203)
Fixes https://github.com/hwchase17/langchain/issues/6204

### Context

An typo issue with `pandoc`.

#### Who can review?
@hwchase17
2023-06-15 08:18:27 -07:00
Kyle Roth
c7db9febb0
count tokens for new OpenAI model versions (#6195)
Trying to call `ChatOpenAI.get_num_tokens_from_messages` returns the
following error for the newly announced models `gpt-3.5-turbo-0613` and
`gpt-4-0613`:

```
NotImplementedError: get_num_tokens_from_messages() is not presently implemented for model gpt-3.5-turbo-0613.See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.
```

This adds support for counting tokens for those models, by counting
tokens the same way they're counted for the previous versions of
`gpt-3.5-turbo` and `gpt-4`.

#### reviewers

  - @hwchase17
  - @agola11
2023-06-15 06:16:03 -07:00
xu0o0
7ad13cdbdb
feat: add content_format param to ConfluenceLoader.load() (#5922)
Confluence API supports difference format of page content. The storage
format is the raw XML representation for storage. The view format is the
HTML representation for viewing with macros rendered as though it is
viewed by users.

Add the `content_format` parameter to `ConfluenceLoader.load()` to
specify the content format, this is
set to `ContentFormat.STORAGE` by default.

#### Who can review?

Tag maintainers/contributors who might be interested: @eyurtsev

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-14 16:56:28 -07:00
0xJordan
c5a46e7435
feat: Add support for the Solidity language (#6054)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

## Add Solidity programming language support for code splitter.

Twitter: @0xjord4n_

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->
#### Who can review?

Tag maintainers/contributors who might be interested:
@hwchase17

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-14 14:25:02 -07:00
Nuno Campos
17c4ec4812
Add docs for tags (#6155)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @hwchase17

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-14 14:01:58 -07:00
thiswillbeyourgithub
4a649e3b14
typo: 'following following' to 'following' (#6163)
Co-authored-by: thiswillbeyourgithub <github@32mail.33mail.com>
2023-06-14 10:58:47 -07:00
Maciej Bryński
8a44c879c6
Update readthedocs_documentation.ipynb (#6148)
Minor fix in documentation. 
Change URL in wget call to proper one.
2023-06-14 07:21:48 -07:00
Zander Chase
e0e3ef1c57
Update Name (#6136) 2023-06-13 22:25:36 -07:00
Zander Chase
4555ad5d1f
Add Run Collector Callback (#6133)
Add a callback handler that can collect nested run objects. Useful for
evaluation.
2023-06-13 22:17:37 -07:00
Harrison Chase
6ac120f299
bump ver to 200 (#6130) 2023-06-13 19:33:51 -07:00
Harrison Chase
e41f0b341c
add functions agent (#6113) 2023-06-13 18:51:01 -07:00
Zander Chase
b3b155d488
Return session name in runner response (#6112)
Makes it easier to then run evals w/o thinking about specifying a
session
2023-06-13 16:59:43 -07:00
Harrison Chase
e74733ab9e
support streaming for functions (#6115) 2023-06-13 15:26:26 -07:00
Nuno Campos
11ab0be11a
Add support for tags (#5898)
<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->

Fixes # (issue)

#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-13 12:30:59 -07:00
Harrison Chase
1281fdf0f2
Harrison/notebook functions (#6103) 2023-06-13 10:52:54 -07:00
Harrison Chase
34ebb29726
bump version to 199 (#6102) 2023-06-13 10:50:33 -07:00
Wenchen Li
f9edf76e7c
Implement max_marginal_relevance_search in VectorStore of Pinecone (#6056)
This adds implementation of MMR search in pinecone; and I have two
semi-related observations about this vector store class:
- Maybe we should also have a
`similarity_search_by_vector_returning_embeddings` like in supabase, but
it's not in the base `VectorStore` class so I didn't implement
- Talking about the base class, there's
`similarity_search_with_relevance_scores`, but in pinecone it is called
`similarity_search_with_score`; maybe we should consider renaming it to
align with other `VectorStore` base and sub classes (or add that as an
alias for backward compatibility)

#### Who can review?

Tag maintainers/contributors who might be interested:
 - VectorStores / Retrievers / Memory - @dev2049
2023-06-13 10:46:45 -07:00
Harrison Chase
970b2f9d38
convert tools to openai (#6100) 2023-06-13 10:40:49 -07:00
Harrison Chase
292accde2b
support functions (#6099) 2023-06-13 10:32:58 -07:00
Lance Martin
ee3d0513ad
Add tests and update notebook for MarkdownHeaderTextSplitter (#6069)
Add test and update notebook for `MarkdownHeaderTextSplitter`.
2023-06-13 09:07:52 -07:00
Keshav Kumar
8fdf88b8e3
Fix for ModuleNotFoundError while running langchain-server. Issue #5833 (#6077)
This PR fixes the error
`ModuleNotFoundError: No module named 'langchain.cli'`
Fixes https://github.com/hwchase17/langchain/issues/5833 (issue)
2023-06-13 08:37:07 -07:00
Zander Chase
0c52275bdb
Use Run object from SDK (#6067)
Update the Run object in the tracer to extend that in the SDK to include
the parameters necessary for tracking/tracing
2023-06-13 07:14:11 -07:00
Harrison Chase
cde1e8739a
turn off repr (#6078) 2023-06-12 22:45:24 -07:00
Nuno Campos
a9b3b2e327
Enable serialization for anthropic (#6049) 2023-06-12 22:39:10 -07:00
Harrison Chase
6ac5d80286
propogate kwargs fully (#6076) 2023-06-12 22:37:55 -07:00
Harrison Chase
ec1a2adf9c
improve tools (#6062) 2023-06-12 22:19:03 -07:00
Julius Lipp
5b6bbf4ab2
Add embaas document extraction api endpoints (#6048)
# Introduces embaas document extraction api endpoints

In this PR, we add support for embaas document extraction endpoints to
Text Embedding Models (with LLMs, in different PRs coming). We currently
offer the MTEB leaderboard top performers, will continue to add top
embedding models and soon add support for customers to deploy thier own
models. Additional Documentation + Infomation can be found
[here](https://embaas.io).

While developing this integration, I closely followed the patterns
established by other langchain integrations. Nonetheless, if there are
any aspects that require adjustments or if there's a better way to
present a new integration, let me know! :)

Additionally, I fixed some docs in the embeddings integration.

Related PR: #5976 

#### Who can review?
  DataLoaders
  - @eyurtsev
2023-06-12 19:13:52 -07:00
Zander Chase
2f0088039d
Log tracer errors (#6066)
Example (would log several times if not for the helper fn. Would emit no
logs due to mulithreading previously)

![image](https://github.com/hwchase17/langchain/assets/130414180/070d25ae-1f06-4487-9617-0a6f66f3f01e)
2023-06-12 17:13:49 -07:00
Lance Martin
b023f0c0f2
Text splitter for Markdown files by header (#5860)
This creates a new kind of text splitter for markdown files.

The user can supply a set of headers that they want to split the file
on.

We define a new text splitter class, `MarkdownHeaderTextSplitter`, that
does a few things:

(1) For each line, it determines the associated set of user-specified
headers
(2) It groups lines with common headers into splits

See notebook for example usage and test cases.
2023-06-12 15:46:42 -07:00
Jens Madsen
2c91f0d750
chore: spedd up integration test by using smaller model (#6044)
Adds a new parameter `relative_chunk_overlap` for the
`SentenceTransformersTokenTextSplitter` constructor. The parameter sets
the chunk overlap using a relative factor, e.g. for a model where the
token limit is 100, a `relative_chunk_overlap=0.5` implies that
`chunk_overlap=50`

Tag maintainers/contributors who might be interested:

 @hwchase17, @dev2049
2023-06-12 13:27:10 -07:00
Harrison Chase
5922742d56 comment out 2023-06-12 10:57:31 -07:00
Harrison Chase
681ba6d520 embaas title 2023-06-12 08:00:14 -07:00
Ben Flast
7a5e36f3f5
Mongo db doc fix (#6042)
I missed a few errors in my initial fix @hwchase1.  Thanks!
2023-06-12 07:29:27 -07:00
Harrison Chase
289e9aeb9d
bump ver to 198 (#6026) 2023-06-11 21:32:45 -07:00
Harrison Chase
d1561b74eb
Harrison/cognitive search (#6011)
Co-authored-by: Fabrizio Ruocco <ruoccofabrizio@gmail.com>
2023-06-11 21:15:42 -07:00
wenmeng zhou
bb7ac9edb5
add dashscope text embedding (#5929)
#### What I do
Adding embedding api for
[DashScope](https://help.aliyun.com/product/610100.html), which is the
DAMO Academy's multilingual text unified vector model based on the LLM
base. It caters to multiple mainstream languages worldwide and offers
high-quality vector services, helping developers quickly transform text
data into high-quality vector data. Currently supported languages
include Chinese, English, Spanish, French, Portuguese, Indonesian, and
more.

#### Who can review?

  Models
  - @hwchase17
  - @agola11

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-11 21:14:20 -07:00
Ben Flast
010d0bfeea
Update MongoDB Atlas support docs (#6022)
Updating MongoDB Atlas support docs @hwchase17 let me know if you have
any questions
2023-06-11 20:57:15 -07:00
Harrison Chase
e05997c25e
Harrison/hologres (#6012)
Co-authored-by: Changgeng Zhao <changgeng@nyu.edu>
Co-authored-by: Changgeng Zhao <zhaochanggeng.zcg@alibaba-inc.com>
2023-06-11 20:56:51 -07:00
ljeagle
c5bce4a465
add from_documents interface in awadb vector store (#6023)
added new interface from_documents in awadb vector store
  @dev2049

---------

Co-authored-by: vincent <awadb.vincent@gmail.com>
2023-06-11 19:35:03 -07:00
Zander Chase
2c9619bc1d
Remove from PR template (#6018) 2023-06-11 19:34:26 -07:00
ju-bezdek
18f5c985d9
Langchain decorators (#6017)
Added description of LangChain Decorators  into the integration section

<!--
Thank you for contributing to LangChain! Your PR will appear in our
release under the title you set. Please make sure it highlights your
valuable contribution.

Replace this with a description of the change, the issue it fixes (if
applicable), and relevant context. List any dependencies required for
this change.

After you're done, someone will review your PR. They may suggest
improvements. If no one reviews your PR within a few days, feel free to
@-mention the same people again, as notifications can get lost.

Finally, we'd love to show appreciation for your contribution - if you'd
like us to shout you out on Twitter, please also include your handle!
-->

<!-- Remove if not applicable -->


#### Before submitting

<!-- If you're adding a new integration, please include:

1. a test for the integration - favor unit tests that does not rely on
network access.
2. an example notebook showing its use


See contribution guidelines for more information on how to write tests,
lint
etc:


https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
-->

#### Who can review?

Tag maintainers/contributors who might be interested:

@hwchase17 

<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
2023-06-11 19:32:24 -07:00
Zander Chase
a197acfcd3
Update check (#6020)
We were assigning the name as None in on_chat_model_start then not
updating, resulting in a validation error.
2023-06-11 17:59:09 -07:00
Nuno Campos
18af149e91
nc/load (#5733)
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-06-11 15:51:28 -07:00
Zander Chase
614cff89bc
I before E (#6015) 2023-06-11 15:45:12 -07:00
Harrison Chase
a7227ee01b
Harrison/embaas (#6010)
Co-authored-by: Julius Lipp <43986145+juliuslipp@users.noreply.github.com>
2023-06-11 13:35:14 -07:00
xu0o0
232faba796
fix: TypeError when loading confluence pages by cql (#5878)
The Confluence loader uses the wrong API (`Confluence.cql()` provided by
`atlassian-python-api`) to load pages by CQL.
`Confluence.cql()` is a wrapper of the `/rest/api/search` API which
searches for entities in Confluence.

To search for pages in Confluence, the loader can use the
`/rest/api/content/search` API.

#### Who can review?

Tag maintainers/contributors who might be interested: @eyurtsev
<!-- For a quicker response, figure out the right person to tag with @

  @hwchase17 - project lead

  Tracing / Callbacks
  - @agola11

  Async
  - @agola11

  DataLoaders
  - @eyurtsev

  Models
  - @hwchase17
  - @agola11

  Agents / Tools / Toolkits
  - @vowelparrot

  VectorStores / Retrievers / Memory
  - @dev2049

 -->
#### References
##### Cloud API

https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-content/#api-wiki-rest-api-content-search-get

https://developer.atlassian.com/cloud/confluence/rest/v1/api-group-search/#api-wiki-rest-api-search-get

##### Server API

https://docs.atlassian.com/ConfluenceServer/rest/8.3.1/#api/content-search
https://docs.atlassian.com/ConfluenceServer/rest/8.3.1/#api/search
2023-06-11 13:23:22 -07:00