Commit Graph

1465 Commits (84d250f78118168139069a71b8c02c8f85ed2d01)

Author SHA1 Message Date
Nuno Campos cf448a6314 Ensure that configurable fields with enums support deduplication 12 months ago
Leonid Ganeline 31f264169d
evaluation criteria (#11681)
the updated value was:
` Criteria.MISOGYNY: "Is the submission misogynistic? If so, respond Y."
`
The " If so, respond Y." should not be here. This sub-string is not
presented in any other criteria and should not be presented here.
I also added a synonym to "misogynistic" as it done in many other
criteria.
12 months ago
Dmitry Tyumentsev e8c1850369
Add YandexGPT LLM and Chat model (#11703)
**Description:** Introducing an ability to work with the
[YandexGPT](https://cloud.yandex.com/en/services/yandexgpt) language
model.
12 months ago
Bagatur c15701eebf
Revert "Add baichuan model" (#11901)
cc @cloudscool, apologies your PR wasn't actually passing CI
12 months ago
cloudscool c1d811c4bc
Add baichuan model 12 months ago
John Mai 0169d45ba8
Supported OutputFixingParser max_retries (#11754)
Description: Supported OutputFixingParser max_retries
 - max_retries: Maximum number of retries to parser.

Issue: None
Dependencies: None
Tag maintainer: @baskaryan
Twitter handle: @JohnMai95

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
12 months ago
volodymyr-memsql ff8e6981ff
SingleStoreDBChatMessageHistory: Add singlestoredb support for ChatMessageHistory (#11705)
**Description**

- Added the `SingleStoreDBChatMessageHistory` class that inherits
`BaseChatMessageHistory` and allows to use of a SingleStoreDB database
as a storage for chat message history.
- Added integration test to check that everything works (requires
`singlestoredb` to be installed)
- Added notebook with usage example
- Removed custom retriever for SingleStoreDB vector store (as it is
useless)

---------

Co-authored-by: Volodymyr Tkachuk <vtkachuk-ua@singlestore.com>
12 months ago
Mohammad Mohtashim 634ccb8ccd
test_stream_log_retriever Unit Test + Tool names fix (#11808)
## Description



| Tool         | Original Tool Name       |
|-----------------------------|---------------------------|
| open-meteo-api              | Open Meteo API            |
| news-api                    | News API                  |
| tmdb-api                    | TMDB API                  |
| podcast-api                 | Podcast API               |
| golden_query                | Golden Query              |
| dall-e-image-generator      | Dall-E Image Generator    |
| twilio                      | Text Message              |
| searx_search_results        | Searx Search Results      |
| dataforseo                  | DataForSeo Results JSON   |

When using these tools through `load_tools`, I encountered the following
validation error:

```console
openai.error.InvalidRequestError: 'TMDB API' does not match '^[a-zA-Z0-9_-]{1,64}$' - 'functions.0.name'
```

In order to avoid this error, I replaced spaces with hyphens in the tool
names:

| Tool           | Corrected Tool Name       |
|-----------------------------|---------------------------|
| open-meteo-api              | Open-Meteo-API            |
| news-api                    | News-API                  |
| tmdb-api                    | TMDB-API                  |
| podcast-api                 | Podcast-API               |
| golden_query                | Golden-Query              |
| dall-e-image-generator      | Dall-E-Image-Generator    |
| twilio                      | Text-Message              |
| searx_search_results        | Searx-Search-Results      |
| dataforseo                  | DataForSeo-Results-JSON   |

This correction resolved the validation error.

Additionally, a unit test,
`tests/unit_tests/schema/runnable/test_runnable.py::test_stream_log_retriever`,
was failing at random. Upon further investigation, I confirmed that the
failure was not related to the above-mentioned changes. The `stream_log`
variable was generating the order of logs in two ways at random The
reason for this behavior is unclear, but in the assertion, I included
both possible orders to account for this variability.
12 months ago
Predrag Gruevski 7c0f1bf23f
Upgrade experimental package dependencies and use Poetry 1.6.1. (#11339)
Part of upgrading our CI to use Poetry 1.6.1.
12 months ago
Eugene Yurtsev c2c0814a94
Add security notice to file management tool (#11878)
Add security notice to file management tool

---------

Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com>
12 months ago
zhaoshengbo cb7e12f6ba
Adapt to the latest version of Alibaba Cloud OpenSearch vector store API (#11849)
Hello Folks,

Alibaba Cloud OpenSearch has released a new version of the vector
storage engine, which has significantly improved performance compared to
the previous version. At the same time, the sdk has also undergone
changes, requiring adjustments alibaba opensearch vector store code to
adapt.

This PR includes:

Adapt to the latest version of Alibaba Cloud OpenSearch API.
More comprehensive unit testing.
Improve documentation.

I have read your contributing guidelines. And I have passed the tests
below

- [x] make format
- [x]  make lint
- [x]  make coverage
- [x]  make test

---------

Co-authored-by: zhaoshengbo <shengbo.zsb@alibaba-inc.com>
12 months ago
Lee e669f9d731
Fix: Sitemap Document Loader Tests and Documentation (#11866)
**Description:**
While working on the Docusaurus site loader #9138, I noticed some
outdated docs and tests for the Sitemap Loader.

**Issue:** 
This is tangentially related to #6691 in reference to doc links. I plan
on digging in to a few of these issue when I find time next.
12 months ago
Jean-Louis Queguiner 8b697ff0ee
feat(llm): add together.xyz as an LLM provider (#11892)
- **Description:** added together.xyz as an LLM provider, 
  - **Issues:** fix some linting issues
  - twitter handle @jilijeanlouis 

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
12 months ago
Leonid Kuligin d269dd2e2f
added a multiturn search based on Vertex AI Search (#11885)
Replace this entire comment with:
- **Description:** Added a retriever based on multi-turn Vertex AI
Search
  - **Twitter handle:** lkuligin
12 months ago
Leonid Kuligin 38ed55245f
added Vertex examples as attributes (#11890)
- **Description:** added examples to Vertex chat models as optional
class attributes, so that a model with examples can be used inside a
chain
  - **Twitter handle:** lkuligin
12 months ago
eryk-dsai 5019f59724
fix: more robust check whether the HF model is quantized (#11891)
Removes the check of `model.is_quantized` and adds more robust way of
checking for 4bit and 8bit quantization in the `huggingface_pipeline.py`
script. I had to make the original change on the outdated version of
`transformers`, because the models had this property before. Seems
redundant now.

Fixes: https://github.com/langchain-ai/langchain/issues/11809 and
https://github.com/langchain-ai/langchain/issues/11759
12 months ago
Eugene Yurtsev 210a48cfb5
Add security considerations (#11869)
Add security considerations to existing graph tools.
12 months ago
Bagatur 25b1d65305
bump 315 (#11850) 12 months ago
Nuno Campos 4321d192ea
Use a less specific return type for | on Runnables (#11762)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
12 months ago
Harrison Chase a506302772
bearly tool (#11812) 12 months ago
Harrison Chase 4a2f0c51a1
use get_llm_cache and set_llm_cache (#11741)
Co-authored-by: Bagatur <baskaryan@gmail.com>
12 months ago
Harrison Chase f3ad22e64a
pipe default key (#11788) 12 months ago
Eugene Yurtsev 0d37b4c27d
Add python,pandas,xorbits,spark agents to experimental (#11774)
See for contex
https://github.com/langchain-ai/langchain/discussions/11680
12 months ago
Michael Feil 233a904f2e
GradientLLM Docs update and model_id renaming. (#10963)
Related to #10800 

- Errors in the Docstring of GradientLLM / Gradient.ai LLM
- Renamed the `model_id` to `model` and adapting this in all tests.
Reason to so is to be in Sync with `GradientEmbeddings` and other LLM's.
- inmproving tests so they check the headers in the sent request.
- making the aiosession a private attribute in the docs, as in the
future `pip install gradientai` will be replacing aiosession.
- adding a example how to fine-tune on the Prompt Template as suggested
in #10800
12 months ago
Bagatur 1559ba4bfc
fix upstash test import (#11781) 12 months ago
Leonid Kuligin 9f0a718198
added candidate_count for Vertex models (#11729)
- **Description:** added support for `candidate_count` parameter on
Vertex
12 months ago
David 9d200e6cbe
Create ChatEverlyAI (#11357)
- Description: Adds the ChatEverlyAI class with llama-2 7b on [EverlyAI
Hosted
Endpoints](https://everlyai.xyz/)
- It inherits from ChatOpenAI and requires openai (probably unnecessary
but it made for a quick and easy implementation)

---------

Co-authored-by: everly-studio <127131037+everly-studio@users.noreply.github.com>
12 months ago
Hristo G 7fb25b4154
Add graceful fallback for ES vectorstore when content field is missing (#11726)
- **Description:**
- If the Elasticsearch field used for Langchain > Document.page_content
is missing because the specific document is
        somehow malformed fail gracefully.

  - **Tag maintainer:** 
    - @joemcelroy
12 months ago
Bagatur f06fcde0d7
rm duplicate zilliz import (#11777) 12 months ago
Bagatur a3330c4258
bump 314 (#11773) 12 months ago
Erick Friis 1861cc7100
General anthropic functions, steps towards experimental integration tests (#11727)
To match change in js here
https://github.com/langchain-ai/langchainjs/pull/2892

Some integration tests need a bit more work in experimental:
![Screenshot 2023-10-12 at 12 02 49
PM](https://github.com/langchain-ai/langchain/assets/9557659/262d7d22-c405-40e9-afef-669e8d585307)

Pretty sure the sqldatabase ones are an actual regression or change in
interface because it's returning a placeholder.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
12 months ago
Nuno Campos 17c69678ab
Revert "New add Baichuan Model" (#11761)
Reverts langchain-ai/langchain#11714

This has linting and formatting issues, plus it's added to chat models
folder but doesn't subclass Chat Model base class
12 months ago
cloudscool 56653c53aa
New add Baichuan Model (#11714)
Motivation and Context
At present, the Baichuan Large Language Model is relatively popular and
efficient in performance. Due to widespread market recognition, this
model has been added to enhance the scalability of Langchain's ability
to access the big language model, so as to facilitate application access
and usage for interested users.

System Info
langchain: 0.0.295
python:3.8.3
IDE:vs code

Description
Add the following files:

1. Add baichuan_baichuaninc_endpoint.py in the
libs/langchain/langchain/chat_models
2. Modify the __init__.py file,which is located in the
libs/langchain/langchain/chat_models/__init__.py:
a. Add "from langchain.chat_models.baichuan_baichuaninc_endpoint import
BaichuanChatEndpoint"
    b. Add "BaichuanChatEndpoint" In the file's __ All__  method

Your contribution
I am willing to help implement this feature and submit a PR, but I would
appreciate guidance from the maintainers or community to ensure the
changes are made correctly and in line with the project's standards and
practices.
12 months ago
Yang, Bo 9e1e0f54d2
Add `TrainableLLM` (#11721)
- **Description:** Add `TrainableLLM` for those LLM support fine-tuning
  - **Tag maintainer:** @hwchase17

This PR add training methods to `GradientLLM`

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
12 months ago
Burak Yılmaz 63e516c2b0
Upstash redis integration (#10871)
- **Description:** Introduced Upstash provider with following wrappers:
UpstashRedisCache, UpstashRedisEntityStore,
UpstashRedisChatMessageHistory, UpstashRedisStore
  - **Issue:** -,
  - **Dependencies:** upstash-redis python package is needed,
  - **Tag maintainer:** @baskaryan 
  - **Twitter handle:** @BurakY744

---------

Co-authored-by: Burak Yılmaz <burakyilmaz@Buraks-MacBook-Pro.local>
Co-authored-by: Bagatur <baskaryan@gmail.com>
12 months ago
Bagatur a9db2b0b92
fix tongyi import (#11745) 12 months ago
Aaron Pham 6c61315067
fix(openllm): update with newer remote client implementation (#11740)
cc @baskaryan

---------

Signed-off-by: Aaron <29749331+aarnphm@users.noreply.github.com>
12 months ago
Richy Wang 11cdfe44af
Implement Alibaba Tongyi chat model apis. (#10922)
Hi there
This PR is aim to implement chat model for Alibaba Tongyi LLM model. It
contains work below:
1.Implement ChatTongyi chat model in langchain.chat_models.tongyi. Note
this is different with tongyi llm model to another PR
https://github.com/langchain-ai/langchain/pull/10878.
For detail it implements _generate() and _stream() function in
ChatTongyi.
2. Add some examples in chat/tongyi.ipynb. 
3. Add integration test in chat_models/test_tongyi.py 

Note async completion for the Text API is not yet supported.
Dependencies: dashscope. It will be installed manually cause it is not
need by everyone.
12 months ago
Adam Demjen 008348ce71
Add ElasticsearchChatMessageHistory (#10932)
**Description**

This PR adds the `ElasticsearchChatMessageHistory` implementation that
stores chat message history in the configured
[Elasticsearch](https://www.elastic.co/elasticsearch/) deployment.

```python
from langchain.memory.chat_message_histories import ElasticsearchChatMessageHistory

history = ElasticsearchChatMessageHistory(
    es_url="https://my-elasticsearch-deployment-url:9200", index="chat-history-index", session_id="123"
)

history.add_ai_message("This is me, the AI")
history.add_user_message("This is me, the human")
```

**Dependencies**
- [elasticsearch client](https://elasticsearch-py.readthedocs.io/)
required

Co-authored-by: Bagatur <baskaryan@gmail.com>
12 months ago
Jonathan Soma 48cf978391
Allow placeholders in OpenAPI endpoints #2938 (#2940)
Use regex matches when checking endpoints instead of exact matches.
`{varname}` becomes `.*`

Fixes #2938

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
12 months ago
Predrag Gruevski 9e32120cbb
Deprecate direct access to globals like `debug` and `verbose`. (#11311)
Instead of accessing `langchain.debug`, `langchain.verbose`, or
`langchain.llm_cache`, please use the new getter/setter functions in
`langchain.globals`:
- `langchain.globals.set_debug()` and `langchain.globals.get_debug()`
- `langchain.globals.set_verbose()` and
`langchain.globals.get_verbose()`
- `langchain.globals.set_llm_cache()` and
`langchain.globals.get_llm_cache()`

Using the old globals directly will now raise a warning.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
12 months ago
Richard Adams 35965df20d
Rspace doc loader (#11511)
**Description:**

Add a document loader for the RSpace Electronic Lab Notebook
(www.researchspace.com), so that scientific documents and research notes
can be easily pulled into Langchain pipelines.

**Issue**

This is an new contribution, rather than an issue fix.

 **Dependencies:** 
  
There are no new required dependencies.
In order to use the loader, clients will need to install rspace_client
SDK using `pip install rspace_client`

---------

Co-authored-by: richarda23 <richard.c.adams@infinityworks.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
12 months ago
Ryan Zotti 9d1867c77f
Update docs to specify Indexing-API-compatible vectorstores (#11581)
**Description:** Update Indexing API docs to specify vectorstores that
are compatible with the Indexing API. I add a unit test to remind
developers to update the documentation whenever they add or change a
vectorstore in a way that affects compatibility. For the unit test I
repurposed existing code from
[here](https://github.com/langchain-ai/langchain/blob/v0.0.311/libs/langchain/langchain/indexes/_api.py#L245-L257).

This is my first PR to an open source project. This is a trivially
simple PR whose main purpose is to make me more comfortable submitting
Langchain PRs. If this PR goes through I plan to submit PRs with more
substantive changes in the near future.

**Issue:** Resolves
[10482](https://github.com/langchain-ai/langchain/discussions/10482).

**Dependencies:** No new dependencies.

**Twitter handle:** None.
12 months ago
Richard Wang 6402c33299
Let Notion document loader support utf-8 and make it default. (#10613)
Use utf-8 encoding by default
12 months ago
Bagatur bd74eba152
add azure openai sched tests (#11723) 12 months ago
Bagatur 9c0584be74
bump 313 (#11718) 12 months ago
sudranga 361f8e1bc6
Add MMR functionality to elasticsearch retriever (#11633)
Allows MMR functionality only for the case where we have access to the
embedding function. Also allows for users to request for fields from
elasticsearch store. These are added to the document metadata.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
12 months ago
Dmitry Tyumentsev ead9d5b55c
Add yandex stt parser (#11435)
Description: Introducing an ability to load a transcription document of
audio file using [Yandex
SpeechKit](https://cloud.yandex.com/en-ru/services/speechkit)
Issue: None
Dependencies: yandex-speechkit
Tag maintainer: @rlancemartin, @eyurtsev
12 months ago
Janos Tolgyesi 15687a28d5
Use correct tokenizer for Bedrock/Anthropic LLMs (#11561)
**Description**

This PR implements the usage of the correct tokenizer in Bedrock LLMs,
if using anthropic models.

**Issue:** #11560

**Dependencies:** optional dependency on `anthropic` python library.

**Twitter handle:** jtolgyesi


---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
12 months ago
kYLe 467b082c34
Modify Anyscale integration to work with Anyscale Endpoint (#11569)
**Description:** Modify Anyscale integration to work with [Anyscale
Endpoint](https://docs.endpoints.anyscale.com/)
and it supports invoke, async invoke, stream and async invoke features

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
12 months ago
plpycoin 51193309ea
Update readthedocs.py (#11110)
Only parse .html files
.svg .png favicon.ico will crash processing phase

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
12 months ago
Nuno Campos ca9de26f2b
Add callback function to RunnablePassthrough (#11564)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
12 months ago
Nuno Campos 7f4734c0dd
Add deploy command to repos generated by cli template (#11711)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
12 months ago
Nuno Campos 1c0857b53e
Fix default impl of aparse_result (#11702)
Should delegate to parse_result, not to aparse, as parse_result is a
method that some output parsers override

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
12 months ago
nuric 44da27c07b
Add SemaDB VST wrapper (#11484)
- **Description**: Adding vectorstore wrapper for
[SemaDB](https://rapidapi.com/semafind-semadb/api/semadb).
- **Issue**: None
- **Dependencies**: None
- **Twitter handle**: semafind

Checks performed:
- [x] `make format`
- [x] `make lint`
- [x] `make test`
- [x] `make spell_check`
- [x] `make docs_build`

Documentation added:

- SemaDB vectorstore wrapper tutorial
1 year ago
hsuyuming 0b743f005b
Feature/enhance huggingfacepipeline to handle different return type (#11394)
**Description:** Avoid huggingfacepipeline to truncate the response if
user setup return_full_text as False within huggingface pipeline.

**Dependencies:** : None
**Tag maintainer:**   Maybe @sam-h-bean ?

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Leonid Kuligin 2aba9ab47e
Retriever based on GCP DocAI Warehouse (#11400)
- **Description:** implements a retriever on top of DocAI Warehouse (to
interact with existing enterprise documents)
  https://cloud.google.com/document-ai-warehouse?hl=en
  - **Issue:** new functionality
 
@baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Erick Friis a477ddda45
Langsmith in readme update (#11497) 1 year ago
Leonid Kuligin 9e81ab47be
Added a better error description if processor name is wrong. (#11488)
Replace this entire comment with:
  - **Description:** added a better error description for this error
  - **Issue:** #11407 
  
  @baskaryan
1 year ago
Robert Yi e75766b759
fix: incorrect arguments in clickhouse docstring (#11693)
fix docstring for clickhouse
1 year ago
Eugene Yurtsev 17b5090c18
Add `type` to Agent actions (#11682)
Add `type` to agent actions.
1 year ago
April c14a8df2ee
wrap confluence attachment processing with a try-except block (#11503)
Prevents document loading from erroring out when an attachment is not
found at the url.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
eajechiloae 4ba2c8ba75
Fix ClearML callback (#11472)
Handle different field names in dicts/dataframes, fixing the ClearML
callback.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Lawrence Wu 93bb19f69a
Fix chains/loading.py error messages (#11688)
- **Description:** make the error messages consistent in
chains/loading.py
  - **Dependencies:** None
1 year ago
Harrison Chase 18ebce2032
fix tool async (#11689) 1 year ago
sudranga 9beb03e771
11474 (#11519)
No relevant documents may be found for a given question. In some use
cases, we could directly respond with a fixed message instead of doing
an LLM call with an empty context. This PR exposes this as an option:
response_if_no_docs_found.

---------

Co-authored-by: Sudharsan Rangarajan <sudranga@nile-global.com>
1 year ago
Joaquin Menendez ef99b06362
feature: add metadata information into the embedding file before uplo… (#11553)
Replace this entire comment with:
- **Description:** In this modified version of the function, if the
metadatas parameter is not None, the function includes the corresponding
metadata in the JSON object for each text. This allows the metadata to
be stored alongside the text's embedding in the vector store.
  - 
  - **Issue:** #10924
  - **Dependencies:** None
  - **Tag maintainer:** @hwchase17
@agola11
  - **Twitter handle:** @MelliJoaco

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Marcin Wątroba 51a3a86022
#11655 Add SQLAlchemyMd5Cache implementation (#11660)
- **Description:** Add SQLAlchemyMd5Cache implementation, 
  - **Issue:** the issue # #11655,
  - **Dependencies:** no deps,
  - **Tag maintainer:** @markowanga

---------

Co-authored-by: Marcin Wątroba <marcin.watroba@pwr.edu.pl>
Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Suresh Kumar Ponnusamy 70f7558db2
langchain-experimental: Add allow_list support in experimental/data_anonymizer (#11597)
- **Description:** Add allow_list support in langchain experimental
data-anonymizer package
  - **Issue:** no
  - **Dependencies:** no
  - **Tag maintainer:** @hwchase17
  - **Twitter handle:**
1 year ago
wemysschen 2363c02cf3
Bos loader (#11525)
**Description:**
Add  BaiduCloud BOS document loader.

---------

Co-authored-by: chenweixu01 <chenweixu01@baidu.com>
Co-authored-by: root <root@icoding-cwx.bcc-szzj.baidu.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Kwanghoon Choi fbb82608cd
Fixed a bug in reporting Python code validation (#11522)
- **Description:** fixed a bug in pal-chain when it reports Python
    code validation errors. When node.func does not have any ids, the
    original code tried to print node.func.id in raising ValueError.
- **Issue:** n/a,
- **Dependencies:** no dependencies,
- **Tag maintainer:** @hazzel-cn, @eyurtsev
- **Twitter handle:** @lazyswamp

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Harrison Chase 9f39c23a13
add input type for convo retrieval chain (#11679) 1 year ago
zhaozhiming d5e762d328
fix: Change the docs of JSONAgentOutputParser (#11594)
I am merely making some minor adjustments to the function documentation.
I hope to provide a small assistance to LangChain.
- **Description:** Change the docs of JSONAgentOutputParser. It will be
`JSON` better,
  - **Issue:** no,
  - **Dependencies:** no,
  - **Tag maintainer:** @hwchase17,
  - **Twitter handle:** Not worth mentioning.
1 year ago
Vinay Kakade dd0cd98861
Add support for ChatOpenAI models in Infino callback handler (#11608)
**Description:** This PR adds support for ChatOpenAI models in the
Infino callback handler. In particular, this PR implements
`on_chat_model_start` callback, so that ChatOpenAI models are supported.
With this change, Infino callback handler can be used to track latency,
errors, and prompt tokens for ChatOpenAI models too (in addition to the
support for OpenAI and other non-chat models it has today). The existing
example notebook is updated to show how to use this integration as well.
cc/ @naman-modi @savannahar68

**Issue:** https://github.com/langchain-ai/langchain/issues/11607 

**Dependencies:** None

**Tag maintainer:** @hwchase17 

**Twitter handle:** [@vkakade](https://twitter.com/vkakade)
1 year ago
Israel Ekpo d0603c86b6
Add Support for Azure Cosmos DB MongoDB vCore Vector Store #11627 (#11632)
This PR adds support for the Azure Cosmos DB MongoDB vCore Vector Store

https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/

https://learn.microsoft.com/en-us/azure/cosmos-db/mongodb/vcore/vector-search

Summary:
- **Description:** added vector store integration for Azure Cosmos DB
MongoDB vCore Vector Store,
  - **Issue:** the issue # it fixes #11627,
  - **Dependencies:** pymongo dependency,
  - **Tag maintainer:** @hwchase17,
  - **Twitter handle:** @izzyacademy

---------

Co-authored-by: Israel Ekpo <israel.ekpo@gmail.com>
Co-authored-by: Israel Ekpo <44282278+izzyacademy@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Erick Friis 28ee6a7c12
Track ChatFireworks time to first_token (#11672) 1 year ago
Eugene Yurtsev 539941281d
Fix output types for BaseChatModel (#11670)
* Should use non chunked messages for Invoke/Batch
* After this PR, stream output type is not represented, do we want to
use the union?

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
1 year ago
Eugene Yurtsev 99adcdb1c9
Add dedicated `type` attribute to be used solely for serialization purposes (#11585)
Adds standard `type` field for all messages that will be
serialized/validated by pydantic.

* The presence of `type` makes it easier for developers consuming
schemas to write client code to serialize/deserialize.
* In LangServe `type` will be used for both validation and will appear
in the generated openapi specs
1 year ago
eryk-dsai 06d5971be9
Fix issue #10985 - Skip model.to(device) if it is instantiated with bitsandbytes config (#11009)
Preventing error caused by attempting to move the model that was already
loaded on the GPU using the Accelerate module to the same or another
device. It is not possible to load model with Accelerate/PEFT to CPU for
now

Addresses:
[#10985](https://github.com/langchain-ai/langchain/issues/10985)
1 year ago
Nuno Campos 64969bc8ae
Add patch_config(configurable=) arg, make with_config(configurable=) merge it with existing (#11662)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
1 year ago
Harrison Chase ce0019b646
make utils conditional (#11646) 1 year ago
Harrison Chase 8f06085b24
make tools conditional (#11647) 1 year ago
Bassem Yacoube 5451b724fc
Adds support for llama2 and fixes MPT-7b url (#11465)
- **Description:** This is an update to OctoAI LLM provider that adds
support for llama2 endpoints hosted on OctoAI and updates MPT-7b url
with the current one.
@baskaryan
Thanks!

---------

Co-authored-by: ML Wiz <bassemgeorgi@gmail.com>
1 year ago
Todd Kerpelman 0bff399af1
Make metadata from the url_selenium loader match that of the web_base loader (#11617)
**Description:** I noticed the metadata returned by the url_selenium
loader was missing several values included by the web_base loader. (The
former returned `{source: ...}`, the latter returned `{source: ...,
title: ..., description: ..., language: ...}`.) This change fixes it so
both loaders return all 4 key value pairs.

Files have been properly formatted and all tests are passing. Note,
however, that I am not much of a python expert, so that whole "Adding
the imports inside the code so that tests pass" thing seems weird to me.
Please LMK if I did anything wrong.
1 year ago
Tarun Thotakura c9d4d53545
Fixed the assignment of custom_llm_provider argument (#11628)
- **Description:** Assigning the custom_llm_provider to the default
params function so that it will be passed to the litellm
- **Issue:** Even though the custom_llm_provider argument is being
defined it's not being assigned anywhere in the code and hence its not
being passed to litellm, therefore any litellm call which uses the
custom_llm_provider as required parameter is being failed. This
parameter is mainly used by litellm when we are doing inference via
Custom API server.
https://docs.litellm.ai/docs/providers/custom_openai_proxy
  - **Dependencies:** No dependencies are required

@krrishdholakia , @baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Leonid Ganeline db67ccb0bb
docstrings cleanup (#11640)
Added missed docstrings. Some reformatting.
1 year ago
Yang, Bo 3a82bd7bdb
Use raise from statement so that users can find detailed error message (#11461)
- **Description:** Use `raise from` statement so that users can find
detailed error message
  - **Tag maintainer:** @baskaryan, @eyurtsev, @hwchase17
1 year ago
Nuno Campos 9a0ed75a95
Add configurable fields with options (#11601)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
1 year ago
Bagatur 7232e082de
bump 312 (#11621) 1 year ago
Eugene Yurtsev 58220cda72
Remove LLM Bash and related bash utilities (#11619)
Deprecate LLMBash and related bash utilities
1 year ago
Shubham Kushwaha 49de862076
Arcee.ai LLM & Retriever integration (#11579)
- **Description:** This PR introduces a new LLM and Retriever API to
https://arcee.ai for the python client
  - **Issue:** implements the integrations as requested in #11578 ,
  - **Dependencies:** no dependencies are required,
  - **Tag maintainer:** @hwchase17
  - **Twitter handle:** shwooobham 


** `make format`, `make lint` and `make test` runs locally.**
```shell
=========== 1245 passed, 277 skipped, 20 warnings in 16.26s ===========
./scripts/check_pydantic.sh .
./scripts/check_imports.sh
poetry run ruff .
[ "." = "" ] || poetry run black . --check
All done!  🍰 
1818 files would be left unchanged.
[ "." = "" ] || poetry run mypy .
Success: no issues found in 1815 source files
[ "." = "" ] || poetry run black .
All done!  🍰 
1818 files left unchanged.
[ "." = "" ] || poetry run ruff --select I --fix .
poetry run codespell --toml pyproject.toml
poetry run codespell --toml pyproject.toml -w
```


**Contributions**
1. Arcee (langchain/llms), ArceeRetriever (langchain/retrievers),
ArceeWrapper (langchain/utilities)
2. docs for Arcee (llms/arcee.py) and
ArceeRetriever(retrievers/arcee.py)
3.

cc: @jacobsolawetz @ben-epstein

---------

Co-authored-by: Shubham <shubham@sORo.local>
1 year ago
Eugene Yurtsev b56ca0c2a4
Deprecate LLMSymbolicMath from langchain core (#11615)
Deprecate LLMSymbolicMath from langchain core package.
1 year ago
Eugene Yurtsev c9bce5bbfb
Add version to langchain_experimental (#11613)
Add version to langchain experimental
1 year ago
Predrag Gruevski 22abeb9f6c
Disable loading jinja2 `PromptTemplate` from file. (#10252)
jinja2 templates are not sandboxed and are at risk for arbitrary code
execution. To mitigate this risk:
- We no longer support loading jinja2-formatted prompt template files.
- `PromptTemplate` with jinja2 may still be constructed manually, but
the class carries a security warning reminding the user to not pass
untrusted input into it.

Resolves #4394.
1 year ago
Nuno Campos c7c03d4709
Fix mutation bugs in callback manager configure (#11603)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
1 year ago
cccs-eric e2a9072b80
Fix CohereRerank configuration (#11583)
**Description:** CohereRerank is missing `cohere_api_key` as a field and
since extras are forbidden, it is not possible to pass-in the key. The
only way is to use an env variable named `COHERE_API_KEY`.

For example, if trying to create a compressor like this:
```python
cohere_api_key = "......Cohere api key......"
compressor = CohereRerank(cohere_api_key=cohere_api_key)
```
you will get the following error:
```
  File "/langchain/.venv/lib/python3.10/site-packages/pydantic/v1/main.py", line 341, in __init__
    raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for CohereRerank
cohere_api_key
  extra fields not permitted (type=value_error.extra)
```
1 year ago
Anar 55fef4b64b
implemented add files method in LLMRails (#11518)
This PR provides add files method with LLMRails. Implemented here are:

docs/extras/integrations/vectorstores/llm-rails.ipynb

---------

Co-authored-by: Anar Aliyev <aaliyev@mgmt.cloudnet.services>
1 year ago
Stephen Hankinson 316dddc7cd
fix wording of query_sql_database_tool_description (#11530)
- **Description:** Fixes minor typo for the
query_sql_database_tool_description in the db toolkit
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Tag maintainer:** @nfcampos 
  - **Twitter handle:** N/A
1 year ago
Ash Vardanian 1acfe86353
Accelerating Math Utils with SimSIMD (#11566)
LangChain relies on NumPy to compute cosine distances, which becomes a
bottleneck with the growing dimensionality and number of embeddings. To
avoid this bottleneck, in our libraries at
[Unum](https://github.com/unum-cloud), we have created a specialized
package - [SimSIMD](https://github.com/ashvardanian/simsimd), that knows
how to use newer hardware capabilities. Compared to SciPy and NumPy, it
reaches 3x-200x performance for various data types. Since publication,
several LangChain users have asked me if I can integrate it into
LangChain to accelerate their workflows, so here I am 🤗

## Benchmarking

To conduct benchmarks locally, run this in your Jupyter:

```py
import numpy as np
import scipy as sp
import simsimd as simd
import timeit as tt

def cosine_similarity_np(X: np.ndarray, Y: np.ndarray) -> np.ndarray:
    X_norm = np.linalg.norm(X, axis=1)
    Y_norm = np.linalg.norm(Y, axis=1)
    with np.errstate(divide="ignore", invalid="ignore"):
        similarity = np.dot(X, Y.T) / np.outer(X_norm, Y_norm)
    similarity[np.isnan(similarity) | np.isinf(similarity)] = 0.0
    return similarity

def cosine_similarity_sp(X: np.ndarray, Y: np.ndarray) -> np.ndarray:
    return 1 - sp.spatial.distance.cdist(X, Y, metric='cosine')

def cosine_similarity_simd(X: np.ndarray, Y: np.ndarray) -> np.ndarray:
    return 1 - simd.cdist(X, Y, metric='cosine')

X = np.random.randn(1, 1536).astype(np.float32)
Y = np.random.randn(1, 1536).astype(np.float32)
repeat = 1000

print("NumPy: {:,.0f} ops/s, SciPy: {:,.0f} ops/s, SimSIMD: {:,.0f} ops/s".format(
    repeat / tt.timeit(lambda: cosine_similarity_np(X, Y), number=repeat),
    repeat / tt.timeit(lambda: cosine_similarity_sp(X, Y), number=repeat),
    repeat / tt.timeit(lambda: cosine_similarity_simd(X, Y), number=repeat),
))
```

## Results

I ran this on an M2 Pro Macbook for various data types and different
number of rows in `X` and reformatted the results as a table for
readability:

| Data Type | NumPy | SciPy | SimSIMD |
| :--- | ---: | ---: | ---: |
| `f32, 1` | 59,114 ops/s | 80,330 ops/s | 475,351 ops/s |
| `f16, 1` | 32,880 ops/s | 82,420 ops/s | 650,177 ops/s |
| `i8, 1` | 47,916 ops/s | 115,084 ops/s | 866,958 ops/s |
| `f32, 10` | 40,135 ops/s | 24,305 ops/s | 185,373 ops/s |
| `f16, 10` | 7,041 ops/s | 17,596 ops/s | 192,058 ops/s |
| `f16, 10` | 21,989 ops/s | 25,064 ops/s | 619,131 ops/s |
| `f32, 100` | 3,536 ops/s | 3,094 ops/s | 24,206 ops/s |
| `f16, 100` | 900 ops/s | 2,014 ops/s | 23,364 ops/s |
| `i8, 100` | 5,510 ops/s | 3,214 ops/s | 143,922 ops/s |

It's important to note that SimSIMD will underperform if both matrices
are huge.
That, however, seems to be an uncommon usage pattern for LangChain
users.
You can find a much more detailed performance report for different
hardware models here:

- [Apple M2
Pro](https://ashvardanian.com/posts/simsimd-faster-scipy/#appendix-1-performance-on-apple-m2-pro).
- [4th Gen Intel Xeon
Platinum](https://ashvardanian.com/posts/simsimd-faster-scipy/#appendix-2-performance-on-4th-gen-intel-xeon-platinum-8480).
- [AWS Graviton
3](https://ashvardanian.com/posts/simsimd-faster-scipy/#appendix-3-performance-on-aws-graviton-3).
  
## Additional Notes

1. Previous version used `X = np.array(X)`, to repackage lists of lists.
It's an anti-pattern, as it will use double-precision floating-point
numbers, which are slow on both CPUs and GPUs. I have replaced it with
`X = np.array(X, dtype=np.float32)`, but a more selective approach
should be discussed.
2. In numerical computations, it's recommended to explicitly define
tolerance levels, which were previously avoided in
`np.allclose(expected, actual)` calls. For now, I've set absolute
tolerance to distance computation errors as 0.01: `np.allclose(expected,
actual, atol=1e-2)`.

---

  - **Dependencies:** adds `simsimd` dependency
  - **Tag maintainer:** @hwchase17
  - **Twitter handle:** @ashvardanian

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
benchello 5de64e6d60
Add option to specify metadata columns in CSV loader (#11576)
#### Description
This PR adds the option to specify additional metadata columns in the
CSVLoader beyond just `Source`.

The current CSV loader includes all columns in `page_content` and if we
want to have columns specified for `page_content` and `metadata` we have
to do something like the below.:
```
csv = pd.read_csv(
        "path_to_csv"
    ).to_dict("records")

documents = [
        Document(
            page_content=doc["content"],
            metadata={
                "last_modified_by": doc["last_modified_by"],
                "point_of_contact": doc["point_of_contact"],
            }
        ) for doc in csv
    ]
```
#### Usage
Example Usage:
```
csv_test  =  CSVLoader(
      file_path="path_to_csv", 
      metadata_columns=["last_modified_by", "point_of_contact"]
 )
```
Example CSV:
```
content, last_modified_by, point_of_contact
"hello world", "Person A", "Person B"
```

Example Result:
```
Document {
 page_content: "hello world"
 metadata: {
 row: '0',
 source: 'path_to_csv',
 last_modified_by: 'Person A',
 point_of_contact: 'Person B',
 }
```

---------

Co-authored-by: Ben Chello <bchello@dropbox.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago