Commit Graph

2094 Commits (0797358c1b8ee18d1e23885661e7097098f40bdb)

Author SHA1 Message Date
Sean Bearden 77a15fa988
Added ability to pass arguments to the Playwright browser (#13146)
- **Description:** Enhanced `create_sync_playwright_browser` and
`create_async_playwright_browser` functions to accept a list of
arguments. These arguments are now forwarded to
`browser.chromium.launch()` for customizable browser instantiation.
  - **Issue:** #13143
  - **Dependencies:** None
  - **Tag maintainer:** @eyurtsev,
  - **Twitter handle:** Dr_Bearden

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
10 months ago
Joan Fontanals dcccf8fa66
adapt Jina Embeddings to new Jina AI Embedding API (#13658)
- **Description:** Adapt JinaEmbeddings to run with the new Jina AI
Embedding platform
- **Twitter handle:** https://twitter.com/JinaAI_

---------

Co-authored-by: Joan Fontanals Martinez <joan.fontanals.martinez@jina.ai>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
10 months ago
guillaumedelande ea0afd07ca
Update azuresearch.py following recent change from azure-search-documents library (#13472)
- **Description:** 

Reference library azure-search-documents has been adapted in version
11.4.0:

1. Notebook explaining Azure AI Search updated with most recent info
2. HnswVectorSearchAlgorithmConfiguration --> HnswAlgorithmConfiguration
3. PrioritizedFields(prioritized_content_fields) -->
SemanticPrioritizedFields(content_fields)
4. SemanticSettings --> SemanticSearch
5. VectorSearch(algorithm_configurations) -->
VectorSearch(configurations)

--> Changes now reflected on Langchain: default vector search config
from langchain is now compatible with officially released library from
Azure.

  - **Issue:**
Issue creating a new index (due to wrong class used for default vector
search configuration) if using latest version of azure-search-documents
with current langchain version
  - **Dependencies:** azure-search-documents>=11.4.0,
  - **Tag maintainer:** ,

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
10 months ago
price-deshaw 5cb3393e20
update OpenAI function agents' llm validation (#13538)
- **Description:** This PR modifies the LLM validation in OpenAI
function agents to check whether the LLM supports OpenAI functions based
on a property (`supports_oia_functions`) instead of whether the LLM
passed to the agent `isinstance` of `ChatOpenAI`. This allows classes
that extend `BaseChatModel` to be passed to these agents as long as
they've been integrated with the OpenAI APIs and have this property set,
even if they don't extend `ChatOpenAI`.
  - **Issue:** N/A
  - **Dependencies:** none
10 months ago
Max Weng 74c7b799ef
migrate openai audio api (#13557)
for issue https://github.com/langchain-ai/langchain/issues/13162
migrate openai audio api, as [openai v1.0.0 Migration
Guide](https://github.com/openai/openai-python/discussions/742)

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Double Max <max@ground-map.com>
10 months ago
Arnaud Gelas abbba6c7d8
openapi/planner.py: Deal with json in markdown output cases (#13576)
- **Description:** In openapi/planner deal with json in markdown output
cases
- **Issue:** In some cases LLMs could return json in markdown which
can't be loaded.
  - **Dependencies:**
  - **Tag maintainer:** @eyurtsev
  - **Twitter handle:**

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
10 months ago
Harrison Chase 8eab4d95c0
Harrison/delegate from template (#14266)
Co-authored-by: M.R. Sopacua <144725145+msopacua@users.noreply.github.com>
10 months ago
Nolan b49104c2c9
Add missing doc key to metadata field in AzureSearch Vectorstore (#13328)
- **Description:** Adds doc key to metadata field when adding document
to Azure Search.
  - **Issue:** -,
  - **Dependencies:** -,
  - **Tag maintainer:** @eyurtsev,
  - **Twitter handle:** @finnless

Right now the document key with the name FIELDS_ID is not included in
the FIELDS_METADATA field, and therefore is not included in the Document
returned from a query. This is really annoying if you want to be able to
modify that item in the vectorstore.

Other's thoughts on this are welcome.
10 months ago
Jon Watte e042e5df35
fix: call _on_llm_error() (#13581)
Description: There's a copy-paste typo where on_llm_error() calls
_on_chain_error() instead of _on_llm_error().
Issue: #13580 
Dependencies: None
Tag maintainer: @hwchase17 
Twitter handle: @jwatte

"Run `make format`, `make lint` and `make test` to check this locally."
The test scripts don't work in a plain Ubuntu LTS 20.04 system.
It looks like the dev container pulling is stuck. Or maybe the internet
is just ornery today.

---------

Co-authored-by: jwatte <jwatte@observeinc.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
10 months ago
Hamza Ahmed fcc8e5e839
Update geodataframe.py (#13573)
here it is validating shapely.geometry.point.Point: if not
isinstance(data_frame[page_content_column].iloc[0], gpd.GeoSeries):
raise ValueError(
f"Expected data_frame[{page_content_column}] to be a GeoSeries" you need
it to validate the geoSeries and not the shapely.geometry.point.Point

if not isinstance(data_frame[page_content_column], gpd.GeoSeries):
            raise ValueError(
f"Expected data_frame[{page_content_column}] to be a GeoSeries"

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
10 months ago
Harrison Chase 2213fc9711
Harrison/bookend ai (#14258)
Co-authored-by: stvhu-bookend <142813359+stvhu-bookend@users.noreply.github.com>
10 months ago
cxumol 0d47d15a9f
add(feat): Text Embeddings by Cloudflare Workers AI (#14220)
Add [Text Embeddings by Cloudflare Workers
AI](https://developers.cloudflare.com/workers-ai/models/text-embeddings/).
It's a new integration.
Trying to align it with its langchain-js version counterpart
[here](https://api.js.langchain.com/classes/embeddings_cloudflare_workersai.CloudflareWorkersAIEmbeddings.html).
- Dependencies: N/A
- Done `make format` `make lint` `make spell_check` `make
integration_tests` and all my changes was passed
10 months ago
Harrison Chase c51001f01e
fix comet tracer (#14259) 10 months ago
Harrison Chase 4fb72ff76f
fake consistent embeddings cleanup (#14256)
delete code that could never be reached
10 months ago
Michael Landis e26906c1dc
feat: implement max marginal relevance for momento vector index (#13619)
**Description**

Implements `max_marginal_relevance_search` and
`max_marginal_relevance_search_by_vector` for the Momento Vector Index
vectorstore.

Additionally bumps the `momento` dependency in the lock file and adds
logging to the implementation.

**Dependencies**

 updates `momento` dependency in lock file

**Tag maintainer**

@baskaryan 

**Twitter handle**

Please tag @momentohq for Momento Vector Index and @mloml for the
contribution 🙇

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
10 months ago
deedy5 ee9abb6722
Bugfix duckduckgo_search news search (#13670)
- **Description:** 
Bugfix duckduckgo_search news search
  - **Issue:** 
https://github.com/langchain-ai/langchain/issues/13648
  - **Dependencies:** 
None
  - **Tag maintainer:** 
@baskaryan

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
10 months ago
Aliaksandr Kuzmik 676a077c4e
Add CometTracer (#13661)
Hi! I'm Alex, Python SDK Team Lead from
[Comet](https://www.comet.com/site/).

This PR contains our new integration between langchain and Comet -
`CometTracer` class which uses new `comet_llm` python package for
submitting data to Comet.

No additional dependencies for the langchain package are required
directly, but if the user wants to use `CometTracer`, `comet-llm>=2.0.0`
should be installed. Otherwise an exception will be raised from
`CometTracer.__init__`.

A test for the feature is included.

There is also an already existing callback (and .ipynb file with
example) which ideally should be deprecated in favor of a new tracer. I
wasn't sure how exactly you'd prefer to do it. For example we could open
a separate PR for that.

I'm open to your ideas :)
10 months ago
Harrison Chase 921c4b5597
Harrison/searchapi (#14252)
Co-authored-by: SebastjanPrachovskij <86522260+SebastjanPrachovskij@users.noreply.github.com>
10 months ago
Colin Ulin 9f9cb71d26
Embaas - added backoff retries for network requests (#13679)
Running a large number of requests to Embaas' servers (or any server)
can result in intermittent network failures (both from local and
external network/service issues). This PR implements exponential backoff
retries to help mitigate this issue.
10 months ago
Kastan Day 65faba91ad
langchain[patch]: Adding new Github functions for reading pull requests (#9027)
The Github utilities are fantastic, so I'm adding support for deeper
interaction with pull requests. Agents should read "regular" comments
and review comments, and the content of PR files (with summarization or
`ctags` abbreviations).

Progress:
- [x] Add functions to read pull requests and the full content of
modified files.
- [x] Function to use Github's built in code / issues search.

Out of scope:
- Smarter summarization of file contents of large pull requests (`tree`
output, or ctags).
- Smarter functions to checkout PRs and edit the files incrementally
before bulk committing all changes.
- Docs example for creating two agents:
- One watches issues: For every new issue, open a PR with your best
attempt at fixing it.
- The other watches PRs: For every new PR && every new comment on a PR,
check the status and try to finish the job.

<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
10 months ago
Hynek Kydlíček aa8ae31e5b
core[patch]: add response kwarg to on_llm_error
# Dependencies
None

# Twitter handle
@HKydlicek

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
10 months ago
Jacob Lee a26c4a0930
Allow base_store to be used directly with MultiVectorRetriever (#14202)
Allow users to pass a generic `BaseStore[str, bytes]` to
MultiVectorRetriever, removing the need to use the `create_kv_docstore`
method. This encoding will now happen internally.

@rlancemartin @eyurtsev

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
10 months ago
Vincent Brouwers 67662564f3
langchain[patch]: Fix `config` arg detection for wrapped lambdarunnable (#14230)
**Description:**
When a RunnableLambda only receives a synchronous callback, this
callback is wrapped into an async one since #13408. However, this
wrapping with `(*args, **kwargs)` causes the `accepts_config` check at
[/libs/core/langchain_core/runnables/config.py#L342](ee94ef55ee/libs/core/langchain_core/runnables/config.py (L342))
to fail, as this checks for the presence of a "config" argument in the
method signature.

Adding a `functools.wraps` around it, resolves it.
10 months ago
Jacob Lee de86b84a70
Prefer byte store interface for Upstash BaseStore to match other Redis (#14201)
If we are not going to make the existing Docstore class also implement
`BaseStore[str, Document]`, IMO all base store implementations should
always be `[str, bytes]` so that they are more interchangeable.

CC @rlancemartin @eyurtsev
10 months ago
Harrison Chase 411aa9a41e
Harrison/nasa tool (#14245)
Co-authored-by: Jacob Matias <88005863+matiasjacob25@users.noreply.github.com>
Co-authored-by: Karam Daid <karam.daid@mail.utoronto.ca>
Co-authored-by: Jumana <jumana.fanous@mail.utoronto.ca>
Co-authored-by: KaramDaid <38271127+KaramDaid@users.noreply.github.com>
Co-authored-by: Anna Chester <74325334+CodeMakesMeSmile@users.noreply.github.com>
Co-authored-by: Jumana <144748640+jfanous@users.noreply.github.com>
10 months ago
nceccarelli 5fea63327b
Support Azure gov cloud in Azure Cognitive Search retriever (#13695)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
- **Description:** The existing version hardcoded search.windows.net in
the base url. This is not compatible with the gov cloud. I am allowing
the user to override the default for gov cloud support.,
  - **Issue:** N/A, did not write up in an issue,
  - **Dependencies:** None

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Nicholas Ceccarelli <nceccarelli2@moog.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
10 months ago
ealt e09b876863
Fixes error loading Obsidian templates (#13888)
- **Description:** Obsidian templates can include
[variables](https://help.obsidian.md/Plugins/Templates#Template+variables)
using double curly braces. `ObsidianLoader` uses PyYaml to parse the
frontmatter of documents. This parsing throws an error when encountering
variables' curly braces. This is avoided by temporarily substituting
safe strings before parsing.
  - **Issue:** #13887
  - **Tag maintainer:** @hwchase17
10 months ago
Nithish Raghunandanan eecfa3f9e5
Add Couchbase document loader (#13979)
**Description:** 
Adds the document loader for [Couchbase](http://couchbase.com/), a
distributed NoSQL database.
**Dependencies:** 
Added the Couchbase SDK as an optional dependency.
**Twitter handle:** nithishr

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
Muntaqa Mahmood 25f72944a0
Add: Steam API tool (#14008)
- **Description:** Our PR is an integration of a Steam API Tool that
makes recommendations on steam games based on user's Steam profile and
provides information on games based on user provided queries.
- **Issue:** the issue # our PR implements:
https://github.com/langchain-ai/langchain/issues/12120
- **Dependencies:** python-steam-api library, steamspypi library and
decouple library
  - **Tag maintainer:** @baskaryan, @hwchase17 
  - **Twitter handle:** N/A

Hello langchain Maintainers,

We are a team of 4 University of Toronto students contributing to
langchain as part of our course [CSCD01 (link to course
page)](https://cscd01.com/work/open-source-project). We hope our changes
help the community. We have run make format, make lint and make test
locally before submitting the PR. To our knowledge, our changes do not
introduce any new errors.

Our PR integrates the python-steam-api, steamspypi and decouple
packages. We have added integration tests to test our python API
integration into langchain and an example notebook is also provided.

Our amazing team that contributed to this PR: @JohnY2002, @shenceyang,
@andrewqian2001 and @muntaqamahmood

Thank you in advance to all the maintainers for reviewing our PR!

---------

Co-authored-by: Shence <ysc1412799032@163.com>
Co-authored-by: JohnY2002 <johnyuan0526@gmail.com>
Co-authored-by: Andrew Qian <andrewqian2001@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: JohnY <94477598+JohnY2002@users.noreply.github.com>
10 months ago
Bob Lin cd2028288e
Add openai v2 adapter (#14063)
### Description

Starting from [openai version
1.0.0](17ac677995 (module-level-client)),
the camel case form of `openai.ChatCompletion` is no longer supported
and has been changed to lowercase `openai.chat.completions`. In
addition, the returned object only accepts attribute access instead of
index access:

```python
import openai

# optional; defaults to `os.environ['OPENAI_API_KEY']`
openai.api_key = '...'

# all client options can be configured just like the `OpenAI` instantiation counterpart
openai.base_url = "https://..."
openai.default_headers = {"x-foo": "true"}

completion = openai.chat.completions.create(
    model="gpt-4",
    messages=[
        {
            "role": "user",
            "content": "How do I output all files in a directory using Python?",
        },
    ],
)
print(completion.choices[0].message.content)
```

So I implemented a compatible adapter that supports both attribute
access and index access:

```python
In [1]: from langchain.adapters import openai as lc_openai
   ...: messages = [{"role": "user", "content": "hi"}]

In [2]: result = lc_openai.chat.completions.create(
   ...:     messages=messages, model="gpt-3.5-turbo", temperature=0
   ...: )

In [3]: result.choices[0].message
Out[3]: {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}

In [4]: result["choices"][0]["message"]
Out[4]: {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}

In [5]: result = await lc_openai.chat.completions.acreate(
   ...:     messages=messages, model="gpt-3.5-turbo", temperature=0
   ...: )

In [6]: result.choices[0].message
Out[6]: {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}

In [7]: result["choices"][0]["message"]
Out[7]: {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}

In [8]: for rs in lc_openai.chat.completions.create(
    ...:     messages=messages, model="gpt-3.5-turbo", temperature=0, stream=True
    ...: ):
    ...:     print(rs.choices[0].delta)
    ...:     print(rs["choices"][0]["delta"])
    ...:
{'role': 'assistant', 'content': ''}
{'role': 'assistant', 'content': ''}
{'content': 'Hello'}
{'content': 'Hello'}
{'content': '!'}
{'content': '!'}

In [20]: async for rs in await lc_openai.chat.completions.acreate(
    ...:     messages=messages, model="gpt-3.5-turbo", temperature=0, stream=True
    ...: ):
    ...:     print(rs.choices[0].delta)
    ...:     print(rs["choices"][0]["delta"])
    ...:
{'role': 'assistant', 'content': ''}
{'role': 'assistant', 'content': ''}
{'content': 'Hello'}
{'content': 'Hello'}
{'content': '!'}
{'content': '!'}
...
```

### Twitter handle

[lin_bob57617](https://twitter.com/lin_bob57617)
10 months ago
billytrend-cohere 0f02081392
Add input_type override (#14068)
Add option to override input_type for cohere's v3 embeddings models

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
Dmitrii Rashchenko aaabc1574f
Support of custom hugging face inference endpoints url (#14125)
- **Description:** to support not only publicly available Hugging Face
endpoints, but also protected ones (created with "Inference Endpoints"
Hugging Face feature), I have added ability to specify custom api_url.
But if not specified, default behaviour won't change
  - **Issue:** #9181,
  - **Dependencies:** no extra dependencies
10 months ago
Harrison Chase e32185193e
Harrison/embass (#14242)
Co-authored-by: Julius Lipp <lipp.julius@gmail.com>
10 months ago
umair mehmood 8504ec56e4
fixed: ModuleNotFoundError: No module named 'clarifai.auth' (#14215)
Updated the clarifai imports 

fixed: #14175 

@efriis 
@baskaryan
10 months ago
Hieu Lam ca8a022cd9
Fixed OpenAIFunctionsAgent not returning when receiving AgentFinish (#14236)
**Description:** The way the condition is checked in the
`return_stopped_response` function of `OpenAIAgent` may not be correct,
when the value returned is `AgentFinish` from the tools it does not work
properly.


Thanks for review, @baskaryan, @eyurtsev, @hwchase17.
10 months ago
Unai Garay Maestre 6826feea14
Adds `llm_chain_kwargs` to `BaseRetrievalQA.from_llm` (#14224)
- **Description:** Adds `llm_chain_kwargs` to `BaseRetrievalQA.from_llm`
so these can be passed to the LLM at runtime,
- **Issue:** https://github.com/langchain-ai/langchain/issues/14216,

---------

Signed-off-by: ugm2 <unaigaraymaestre@gmail.com>
10 months ago
James Braza 6ce5dab38c
Clarifying descriptions in `GuardrailsOutputParser` (#14228)
Upstreaming knowledge from
https://github.com/guardrails-ai/guardrails/discussions/473 to LangChain
10 months ago
geret1 50aee687c6
langchain[patch]: Cerebrium model_api_request deprecation (#12704)
- **Description:** As part of my conversation with Cerebrium team,
`model_api_request` will be no longer available in cerebrium lib so it
needs to be replaced.
  - **Issue:** #12705 12705,
  - **Dependencies:** Cerebrium team (agreed)
  - **Tag maintainer:** @eyurtsev 
  - **Twitter handle:** No official Twitter account sorry :D

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
William FH 246dc4f9cc
langchain[patch]: Pass kwargs to chat fireworks (#14183)
Otherwise `.bind()` isn't really any good
10 months ago
Kaiboon Ee e961c57fd2
langchain[patch]: Mask API key for Arcee LLM (#14193)
- **Description:** Mask API key for Arcee LLM and its associated unit
tests
  - **Issue:** https://github.com/langchain-ai/langchain/issues/12165
  - **Dependencies:** N/A
  - **Tag maintainer:** @eyurtsev
  - **Twitter handle:** `eekaiboon`

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
Daniyar Supiyev 092f302c0f
langchain[patch]: Asynchronous human-in-the-loop callback (#14195)
**Description:** Adding a possibility to use asynchronous callback
handler in human-in-the-loop validation tool. Very useful, for example,
if you want to implement a validation over Telegram bot.
**Issue:** -
**Dependencies:** -

---------

Co-authored-by: Daniyar_Supiyev <daniyar_supiyev@epam.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
Mark Cusack 16c83f786c
Adds the Yellowbrick Data Warehouse as a supported vector store (#13820)
- **Description** An integration to allow the Yellowbrick Data Warehouse
to function as a vector store

---------

Co-authored-by: markcusack <markcusack@markcusacksmac.lan>
Co-authored-by: markcusack <markcusack@Mark-Cusack-sMac.local>
10 months ago
Hendrik Hogertz e6862e6e7d
Fix Azure Openai function calling in streaming mode (#13768)
- **Description**: This PR addresses an issue with the OpenAI API
streaming response, where initially the key (arguments) is provided but
the value is None. Subsequently, it updates with {"arguments": "{\n"},
leading to a type inconsistency that causes an exception. The specific
error encountered is ValueError: additional_kwargs["arguments"] already
exists in this message, but with a different type. This change aims to
resolve this inconsistency and ensure smooth API interactions.
- **Issue**: None.
- **Dependencies**: None.
- **Tag maintainer**: @eyurtsev

This is an updated version of #13229 based on the refactored code.
Credit goes to @superken01.

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
10 months ago
Nicolò Boschi e204657b3c
AstraDB VectorStore: implement pre_delete_collection (#13780)
- **Description:** some vector stores have a flag for try deleting the
collection before creating it (such as ´vectorpg´). This is a useful
flag when prototyping indexing pipelines and also for integration tests.
Added the bool flag `pre_delete_collection ` to the constructor (default
False)
  - **Tag maintainer:** @hemidactylus 
  - **Twitter handle:** nicoloboschi

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
10 months ago
Chelsea E. Manning 2780d2d4dd
Extend OpenAIEmbeddings class to support non-`tiktoken` based embeddings (#13884)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
- **Description:** This extends `OpenAIEmbeddings` to add support for
non-`tiktoken` based embeddings, specifically for use with the new
`text-generation-webui` API (`--extensions openai`) which does not
support `tiktoken` encodings, but rather strings
  - **Issue:** Not found,
- **Dependencies:** HuggingFace `transformers.AutoTokenizer` is new
dependency for running the model without `tiktoken`
- **Tag maintainer:** @baskaryan based on last commit for
`langchain-core` refactor
  - **Twitter handle:** @xychelsea

Modified the tokenization process to be model-agnostic, allowing for
both OpenAI and non-OpenAI model tokenizations, by setting the new
default `bool` flag `tiktoken_enabled` to `False`. This requeires
HuggingFace’s AutoTokenizer and handling tokenization for models
requiring different preprocessing steps to generate a chunked string
request rather than a list of integers.

Updated the embeddings generation process to accommodate non-OpenAI
models. This includes converting tokenized text into embeddings using
OpenAI’s and Hugging Face’s model architectures.
 -->
10 months ago
Changgeng Zhao 9b59bde93d
Update Hologres vector store: use hologres-vector (#13767)
Hi,
I made some code changes on the Hologres vector store to improve the
data insertion performance.
Also, this version of the code uses `hologres-vector` library. This
library is more convenient for us to update, and more efficient in
performance.
The code has passed the format/lint/spell check. I have run the unit
test for Hologres connecting to my own database.
Please check this PR again and tell me if anything needs to change.

Best,
Changgeng,
Developer @ Alibaba Cloud

Co-authored-by: Changgeng Zhao <zhaochanggeng.zcg@alibaba-inc.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
10 months ago
Nicolò Boschi 0de7cf898d
Ensure AstraDB integration tests clean up the environment (#13774)
- **Description:** currently astra_db integration tests might leave
orphan collections
  - **Tag maintainer:** @hemidactylus 
  - **Twitter handle:** nicoloboschi
10 months ago
Chad Norvell 8a0951d934
Fix Mathpix PDF loader integration (#13949)
- **Description:** Fixes the Mathpix PDF loader API integration.
Specifically, ensures that Mathpix auth headers are provided for every
request, and ensures that we recognize all errors that can occur during
a request. Also, the option to provide API keys as kwargs never actually
worked before, but now that's fixed too.
  - **Issue:** #11249
  - **Dependencies:** None
10 months ago
gzyJoy 32d4bb4590
Added Slacktoolkit (#14012)
- **Description:** 
This PR introduces the Slack toolkit to LangChain, which allows users to
read and write to Slack using the Slack API. Specifically, we've added
the following tools.
1. get_channel: Provides a summary of all the channels in a workspace.
2. get_message: Gets the message history of a channel.
3. send_message: Sends a message to a channel.
4. schedule_message: Sends a message to a channel at a specific time and
date.

- **Issue:** This pull request addresses [Add Slack Toolkit
#11747](https://github.com/langchain-ai/langchain/issues/11747)
  - **Dependencies:** package`slack_sdk`
Note: For this toolkit to function you will need to add a Slack app to
your workspace. Additional info can be found
[here](https://slack.com/help/articles/202035138-Add-apps-to-your-Slack-workspace).

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ArianneLavada <ariannelavada@gmail.com>
Co-authored-by: ArianneLavada <84357335+ArianneLavada@users.noreply.github.com>
Co-authored-by: ariannelavada@gmail.com <you@example.com>
10 months ago
Richie 99e5ee6a84
fix(vectorstores): incorrect import for mongodb atlas DriverInfo (#14060)
- **Description:** fix `import` issue for `mongodb atlas` vectore store
integration
  - **Issue:** none
  - **Dependencies:** none

while trying to follow official `langchain`'s [mongodb integration
guide](https://python.langchain.com/docs/integrations/vectorstores/mongodb_atlas),
an import error will happen.

It's caused by incorrect import location:
- `from pymongo import DriverInfo` should be `from pymongo.driver_info
import DriverInfo`
- reference: [pymongo's DriverInfo
class](https://pymongo.readthedocs.io/en/stable/api/pymongo/driver_info.html#pymongo.driver_info.DriverInfo)

Thanks!
10 months ago
James Braza 3833882ab7
Removing extra `StdOutCallbackHandler` overridden methods (#14136)
Unnecessarily overridden methods:

- Give the idea the subclass is doing something special (when it isn't)
- Block CTRL-click to the actual method

This PR removes some unnecessarily overridden methods in
`StdOutCallbackHandler`

Supercedes https://github.com/langchain-ai/langchain/pull/12858
10 months ago
James Braza 052e23be3e
Added Python `logging` tracer (#14190)
This PR creates a logging handler and adds a simple unit test of it

Supercedes https://github.com/langchain-ai/langchain/pull/12862

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
10 months ago
Bob Lin 62505043be
Closed #14069 (#14166)
### Description

Fix #14069

### Twitter handle

[lin_bob57617](https://twitter.com/lin_bob57617)
10 months ago
Yong woo Song 9938086df0
Fix Html2TextTransformer for shallow copy (#14197)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
Hi,
There is some unintended behavior in Html2TextTransformer.
The current code is **directly modifying the original documents that are
passed as arguments to the function.**
Therefore, not only the return of the function but also the input
variables are being modified simultaneously.
**To resolve this, I added unit test code as well.**

reference link: [Shallow vs Deep Copying of Python
Objects](https://realpython.com/copying-python-objects/)

Thanks! ☺️
10 months ago
h3l 818252b1f8
Fix: (issue #14127) Volc Engine MaaS import error (#14194)
- **Description:** fix Volc Engine MaaS import error
- **Issue:** [the issue # it fixes (if
applicable),](https://github.com/langchain-ai/langchain/issues/14127)
  - **Dependencies:** None
  - **Tag maintainer:** @baskaryan 
  - **Twitter handle:**

Co-authored-by: lvzhong <lvzhong@bytedance.com>
10 months ago
Bagatur 0bdb434383
langchain[patch]: Release langchain 0.0.345 (#14184) 10 months ago
Bagatur 15c04a5670
core[patch]: Release 0.0.9 (#14182) 10 months ago
James Braza bdb6ae2ed3
core[patch]: `BaseTracer` helper method for `Run` lookup (#14139)
I observed the same run ID extraction logic is repeated many times in
`BaseTracer`.

This PR creates a helper method for DRY code.
10 months ago
Harutaka Kawamura 41ee3be95f
langchain[patch]: Support passing parameters to `llms.Databricks` and `llms.Mlflow` (#14100)
Before, we need to use `params` to pass extra parameters:

```python
from langchain.llms import Databricks

Databricks(..., params={"temperature": 0.0})
```

Now, we can directly specify extra params:

```python
from langchain.llms import Databricks

Databricks(..., temperature=0.0)
```
10 months ago
Abdul 82102c99b3
langchain[patch]: Running SQLDatabaseChain adds prefix "SQLQuery:\n" (#14058)
- **Issue:** https://github.com/langchain-ai/langchain/issues/12077

---------

Co-authored-by: Abdul Kader Maliyakkal <maliyakk@amazon.com>
10 months ago
Samuel Kemp fd781c89cc
langchain[minor]: add azure ai data document loader (#13404)
This PR adds an "Azure AI data" document loader, which allows Azure AI
users to load their registered data assets as a document object in
langchain.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
James Braza 24385a00de
core[minor], langchain[patch], experimental[patch]: Added missing `py.typed` to `langchain_core` (#14143)
See PR title.

From what I can see, `poetry` will auto-include this. Please let me know
if I am missing something here.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
quantum00549 f7c257553d
langchain[patch]: fixed a bug that was causing the streaming transfer to not work… (#10827)
… properly

Fixed a bug that was causing the streaming transfer to not work
properly.
 - **Description: 
1、The on_llm_new_token method in the streaming callback can now be
called properly in streaming transfer mode.
2、In streaming transfer mode, LLM can now correctly output the complete
response instead of just the first token.
- **Tag maintainer: @wangxuqi 
- **Twitter handle: @kGX7XJjuYxzX9Km

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
Eugene Yurtsev 6d0209e0aa
Improve file system blob loader and generic loader (#14004)
* Add support for passing a specific file to the file system blob loader
* Allow specifying a class parameter for the parser for the generic
loader

```python

class AudioLoader(GenericLoader):
  @staticmethod
  def get_parser(**kwargs):
     return MyAudioParser(**kwargs):
```

The intent of the GenericLoader is to provide on-ramps from different
sources (e.g., web, s3, file system).

An alternative is to use pipelining syntax or creating a Pipeline

```
FileSystemBlobLoader(...) | MyAudioParser
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
Lance Martin cbe4753e1a
Update Open CLIP embd (#14155)
Prior default model required a large amt of RAM and often crashed
Jupyter ntbk kernel.
10 months ago
Amyh102 b6d26d3f9f
infra[patch]: Add unit tests for Huggingface dataset loader (#14053)
- **Description:** Add unit tests for huggingface dataset loader and
sample huggingface dataset for future tests. Updates dependencies for
`datasets` module.
- Adds coverage for [previous pull
request](https://github.com/langchain-ai/langchain/pull/13864)
  - **Tag maintainer:** @hwchase17

---------

Co-authored-by: Amy Han <amyhan@Amys-Air.lan>
Co-authored-by: Amy Han <amyhan@Amys-MacBook-Air.local>
Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
Govinda Totla 62a3473ac0
docs[patch]: add text_splitter.py test (#14025)
Description: Add HTMLHeaderTextSplitter unit test
Dependencies: none
10 months ago
axiangcoding 1b36ddf16c
docs[patch]: add deprecated note for ErnieChatBot (#14061)
- **Description:** just a little change of ErnieChatBot class
description, sugguesting user to use more suitable class
  - **Issue:** none,
  - **Dependencies:** none,
  - **Tag maintainer:** @baskaryan ,
  - **Twitter handle:** none
10 months ago
Devin Dahoon Kim 32da0a4d71
langchain[patch]: use async_embed_with_retry in _aget_len_safe_embeddings (#14110)
**Description**

`embed_with_retry` is for sync operations and not for async operations.
Use `async_embed_with_retry` for appropriate async operations.


I'm using `OpenAIEmbedding(http_client=httpx.AsyncClient())` with only
async operations.
However, I got an error when I use `embedding.aembed_documents` because
`embed_with_retry` uses sync OpenAI client with async http client.
10 months ago
lijie 371bcb7580
langchain[patch]: set maxsplit when parse python function docstring (#14121)
Description

when the desc of arg in python docstring contains ":", the
`_parse_python_function_docstring` will raise **ValueError: too many
values to unpack (expected 2)**.

A sample desc would be:
"""
Args: 
    error_arg: this is an arg with an additional ":" symbol
"""

So, set `maxsplit` parameter to fix it.
10 months ago
Harrison Chase ae646701c4
Harrison/ibm (#14133)
Co-authored-by: Mateusz Szewczyk <139469471+MateuszOssGit@users.noreply.github.com>
10 months ago
Eugene Yurtsev 943aa01c14
Improve indexing performance for Postgres (remote database) for refresh for async API (#14132)
This PR speeds up the indexing api on the async path by batching the uid
updates in the sql record manager (which may be remote).
10 months ago
William FH 528fc76d6a
Update Prompt Format Error (#14044)
The number of times I try to format a string (especially in lcel) is
embarrassingly high. Think this may be more actionable than the default
error message. Now I get nice helpful errors


```
KeyError: "Input to ChatPromptTemplate is missing variable 'input'.  Expected: ['input'] Received: ['dialogue']"
```
10 months ago
William FH 71c2e184b4
[Nits] Evaluation - Some Rendering Improvements (#14097)
- Improve rendering of aggregate results at the end
- flatten reference if present
10 months ago
Mark Scannell 9b0e46dcf0
Improve indexing performance for Postgres (remote database) for refresh (#14126)
**Description:** By combining the document timestamp refresh within a
single call to update(), this enables batching of multiple documents in
a single SQL statement. This is important for non-local databases where
tens of milliseconds has a huge impact on performance when doing
document-by-document SQL statements.
**Issue:** #11935 
**Dependencies:** None
**Tag maintainer:** @eyurtsev
10 months ago
Jacob Lee 3328507f11
langchain[patch], experimental[minor]: Adds OllamaFunctions wrapper (#13330)
CC @baskaryan @hwchase17 @jmorganca 

Having a bit of trouble importing `langchain_experimental` from a
notebook, will figure it out tomorrow

~Ah and also is blocked by #13226~

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
Bagatur 4063bf144a
langchain[patch]: release 0.0.344 (#14095) 10 months ago
Bagatur efce352d6b
core[patch]: release 0.0.8 (#14086) 10 months ago
Harutaka Kawamura 0d08a692a3
langchain[minor]: Migrate mlflow and databricks classes to deployments APIs. (#13699)
## Description

Related to https://github.com/mlflow/mlflow/pull/10420. MLflow AI
gateway will be deprecated and replaced by the `mlflow.deployments`
module. Happy to split this PR if it's too large.

```
pip install git+https://github.com/langchain-ai/langchain.git@refs/pull/13699/merge#subdirectory=libs/langchain
```

## Dependencies

Install mlflow from https://github.com/mlflow/mlflow/pull/10420:

```
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/10420/merge
```

## Testing plan

The following code works fine on local and databricks:

<details><summary>Click</summary>
<p>

```python
"""
Setup
-----
mlflow deployments start-server --config-path examples/gateway/openai/config.yaml
databricks secrets create-scope <scope>
databricks secrets put-secret <scope> openai-api-key --string-value $OPENAI_API_KEY

Run
---
python /path/to/this/file.py secrets/<scope>/openai-api-key
"""
from langchain.chat_models import ChatMlflow, ChatDatabricks
from langchain.embeddings import MlflowEmbeddings, DatabricksEmbeddings
from langchain.llms import Databricks, Mlflow
from langchain.schema.messages import HumanMessage
from langchain.chains.loading import load_chain
from mlflow.deployments import get_deploy_client
import uuid
import sys
import tempfile
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

###############################
# MLflow
###############################
chat = ChatMlflow(
    target_uri="http://127.0.0.1:5000", endpoint="chat", params={"temperature": 0.1}
)
print(chat([HumanMessage(content="hello")]))

embeddings = MlflowEmbeddings(target_uri="http://127.0.0.1:5000", endpoint="embeddings")
print(embeddings.embed_query("hello")[:3])
print(embeddings.embed_documents(["hello", "world"])[0][:3])

llm = Mlflow(
    target_uri="http://127.0.0.1:5000",
    endpoint="completions",
    params={"temperature": 0.1},
)
print(llm("I am"))

llm_chain = LLMChain(
    llm=llm,
    prompt=PromptTemplate(
        input_variables=["adjective"],
        template="Tell me a {adjective} joke",
    ),
)
print(llm_chain.run(adjective="funny"))

# serialization/deserialization
with tempfile.TemporaryDirectory() as tmpdir:
    print(tmpdir)
    path = f"{tmpdir}/llm.yaml"
    llm_chain.save(path)
    loaded_chain = load_chain(path)
    print(loaded_chain("funny"))

###############################
# Databricks
###############################
secret = sys.argv[1]
client = get_deploy_client("databricks")

# External - chat
name = f"chat-{uuid.uuid4()}"
client.create_endpoint(
    name=name,
    config={
        "served_entities": [
            {
                "name": "test",
                "external_model": {
                    "name": "gpt-4",
                    "provider": "openai",
                    "task": "llm/v1/chat",
                    "openai_config": {
                        "openai_api_key": "{{" + secret + "}}",
                    },
                },
            }
        ],
    },
)
try:
    chat = ChatDatabricks(
        target_uri="databricks", endpoint=name, params={"temperature": 0.1}
    )
    print(chat([HumanMessage(content="hello")]))
finally:
    client.delete_endpoint(endpoint=name)

# External - embeddings
name = f"embeddings-{uuid.uuid4()}"
client.create_endpoint(
    name=name,
    config={
        "served_entities": [
            {
                "name": "test",
                "external_model": {
                    "name": "text-embedding-ada-002",
                    "provider": "openai",
                    "task": "llm/v1/embeddings",
                    "openai_config": {
                        "openai_api_key": "{{" + secret + "}}",
                    },
                },
            }
        ],
    },
)
try:
    embeddings = DatabricksEmbeddings(target_uri="databricks", endpoint=name)
    print(embeddings.embed_query("hello")[:3])
    print(embeddings.embed_documents(["hello", "world"])[0][:3])
finally:
    client.delete_endpoint(endpoint=name)

# External - completions
name = f"completions-{uuid.uuid4()}"
client.create_endpoint(
    name=name,
    config={
        "served_entities": [
            {
                "name": "test",
                "external_model": {
                    "name": "gpt-3.5-turbo-instruct",
                    "provider": "openai",
                    "task": "llm/v1/completions",
                    "openai_config": {
                        "openai_api_key": "{{" + secret + "}}",
                    },
                },
            }
        ],
    },
)
try:
    llm = Databricks(
        endpoint_name=name,
        model_kwargs={"temperature": 0.1},
    )
    print(llm("I am"))
finally:
    client.delete_endpoint(endpoint=name)


# Foundation model - chat
chat = ChatDatabricks(
    endpoint="databricks-llama-2-70b-chat", params={"temperature": 0.1}
)
print(chat([HumanMessage(content="hello")]))

# Foundation model - embeddings
embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")
print(embeddings.embed_query("hello")[:3])

# Foundation model - completions
llm = Databricks(
    endpoint_name="databricks-mpt-7b-instruct", model_kwargs={"temperature": 0.1}
)
print(llm("hello"))
llm_chain = LLMChain(
    llm=llm,
    prompt=PromptTemplate(
        input_variables=["adjective"],
        template="Tell me a {adjective} joke",
    ),
)
print(llm_chain.run(adjective="funny"))

# serialization/deserialization
with tempfile.TemporaryDirectory() as tmpdir:
    print(tmpdir)
    path = f"{tmpdir}/llm.yaml"
    llm_chain.save(path)
    loaded_chain = load_chain(path)
    print(loaded_chain("funny"))

```

Output:

```
content='Hello! How can I assist you today?'
[-0.025058426, -0.01938856, -0.027781019]
[-0.025058426, -0.01938856, -0.027781019]
sorry, but I cannot continue the sentence as it is incomplete. Can you please provide more information or context?
Sure, here's a classic one for you:

Why don't scientists trust atoms?

Because they make up everything!
/var/folders/dz/cd_nvlf14g9g__n3ph0d_0pm0000gp/T/tmpx_4no6ad
{'adjective': 'funny', 'text': "Sure, here's a classic one for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"}
content='Hello! How can I assist you today?'
[-0.025058426, -0.01938856, -0.027781019]
[-0.025058426, -0.01938856, -0.027781019]
 a 23 year old female and I am currently studying for my master's degree
content="\nHello! It's nice to meet you. Is there something I can help you with or would you like to chat for a bit?"
[0.051055908203125, 0.007221221923828125, 0.003879547119140625]
[0.051055908203125, 0.007221221923828125, 0.003879547119140625]

hello back
 Well, I don't really know many jokes, but I do know this funny story...
/var/folders/dz/cd_nvlf14g9g__n3ph0d_0pm0000gp/T/tmp7_ds72ex
{'adjective': 'funny', 'text': " Well, I don't really know many jokes, but I do know this funny story..."}
```

</p>
</details>

The existing workflow doesn't break:

<details><summary>click</summary>
<p>

```python
import uuid

import mlflow
from mlflow.models import ModelSignature
from mlflow.types.schema import ColSpec, Schema


class MyModel(mlflow.pyfunc.PythonModel):
    def predict(self, context, model_input):
        return str(uuid.uuid4())


with mlflow.start_run():
    mlflow.pyfunc.log_model(
        "model",
        python_model=MyModel(),
        pip_requirements=["mlflow==2.8.1", "cloudpickle<3"],
        signature=ModelSignature(
            inputs=Schema(
                [
                    ColSpec("string", "prompt"),
                    ColSpec("string", "stop"),
                ]
            ),
            outputs=Schema(
                [
                    ColSpec(name=None, type="string"),
                ]
            ),
        ),
        registered_model_name=f"lang-{uuid.uuid4()}",
    )

# Manually create a serving endpoint with the registered model and run
from langchain.llms import Databricks

llm = Databricks(endpoint_name="<name>")
llm("hello")  # 9d0b2491-3d13-487c-bc02-1287f06ecae7
```

</p>
</details> 

## Follow-up tasks

(This PR is too large. I'll file a separate one for follow-up tasks.)

- Update `docs/docs/integrations/providers/mlflow_ai_gateway.mdx` and
`docs/docs/integrations/providers/databricks.md`.

---------

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
Jeremy Naccache a14cf87576
core[patch]: Add **kwargs to Langchain's dumps() to allow passing of json.dumps() … (#10628)
…parameters.

In Langchain's `dumps()` function, I've added a `**kwargs` parameter.
This allows users to pass additional parameters to the underlying
`json.dumps()` function, providing greater flexibility and control over
JSON serialization.

Many parameters available in `json.dumps()` can be useful or even
necessary in specific situations. For example, when using an Agent with
return_intermediate_steps set to true, the output is a list of
AgentAction objects. These objects can't be serialized without using
Langchain's `dumps()` function.

The issue arises when using the Agent with a language other than
English, which may contain non-ASCII characters like 'é'. The default
behavior of `json.dumps()` sets ensure_ascii to true, converting
`{"name": "José"}` into `{"name": "Jos\u00e9"}`. This can make the
output hard to read, especially in the case of intermediate steps in
agent logs.

By allowing users to pass additional parameters to `json.dumps()` via
Langchain's dumps(), we can solve this problem. For instance, users can
set `ensure_ascii=False` to maintain the original characters.

This update also enables users to pass other useful `json.dumps()`
parameters like `sort_keys`, providing even more flexibility.

The implementation takes into account edge cases where a user might pass
a "default" parameter, which is already defined by `dumps()`, or an
"indent" parameter, which is also predefined if `pretty=True` is set.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
10 months ago
Yong woo Song f4d520ccb5
Fix .env file path in integration_test README.md (#14028)
<!-- Thank you for contributing to LangChain!



Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

### Description
Hello, 

The [integration_test
README](https://github.com/langchain-ai/langchain/tree/master/libs/langchain/tests)
was indicating incorrect paths for the `.env.example` and `.env` files.

`tests/.env.example` ->`tests/integration_tests/.env.example`

While it’s a minor error, it could **potentially lead to confusion** for
the document’s readers, so I’ve made the necessary corrections.

Thank you! ☺️

### Related Issue
- https://github.com/langchain-ai/langchain/pull/2806
10 months ago
Rohan Dey 41a4c06a94
Added support for a Pandas DataFrame OutputParser (#13257)
**Description:**

Added support for a Pandas DataFrame OutputParser with format
instructions, along with unit tests and a demo notebook. Namely, we've
added the ability to request data from a DataFrame, have the LLM parse
the request, and then use that request to retrieve a well-formatted
response.

Within LangChain, it seamlessly integrates with language models like
OpenAI's `text-davinci-003`, facilitating streamlined interaction using
the format instructions (just like the other output parsers).

This parser structures its requests as
`<operation/column/row>[<optional_array_params>]`. The instructions
detail permissible operations, valid columns, and array formats,
ensuring clarity and adherence to the required format.

For example:

- When the LLM receives the input: "Retrieve the mean of `num_legs` from
rows 1 to 3."
- The provided format instructions guide the LLM to structure the
request as: "mean:num_legs[1..3]".

The parser processes this formatted request, leveraging the LLM's
understanding to extract the mean of `num_legs` from rows 1 to 3 within
the Pandas DataFrame.

This integration allows users to communicate requests naturally, with
the LLM transforming these instructions into structured commands
understood by the `PandasDataFrameOutputParser`. The format instructions
act as a bridge between natural language queries and precise DataFrame
operations, optimizing communication and data retrieval.

**Issue:**

- https://github.com/langchain-ai/langchain/issues/11532

**Dependencies:**

No additional dependencies :)

**Tag maintainer:**

@baskaryan 

**Twitter handle:**

No need. :)

---------

Co-authored-by: Wasee Alam <waseealam@protonmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
10 months ago
Masanori Taniguchi 235bdb9fa7
Support Vald secure connection (#13269)
**Description:** 
When using Vald, only insecure grpc connection was supported, so secure
connection is now supported.
In addition, grpc metadata can be added to Vald requests to enable
authentication with a token.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
10 months ago
sudranga d1d693b2a7
Fix issue where response_if_no_docs_found is not implemented on async… (#13297)
Response_if_no_docs_found is not implemented in
ConversationalRetrievalChain for async code paths. Implemented it and
added test cases

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
10 months ago
AthulVincent 67c55cb5b0
Implemented MongoDB Atlas Self-Query Retriever (#13321)
# Description 
This PR implements Self-Query Retriever for MongoDB Atlas vector store.

I've implemented the comparators and operators that are supported by
MongoDB Atlas vector store according to the section titled "Atlas Vector
Search Pre-Filter" from
https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/.

Namely:
```
allowed_comparators = [
      Comparator.EQ,
      Comparator.NE,
      Comparator.GT,
      Comparator.GTE,
      Comparator.LT,
      Comparator.LTE,
      Comparator.IN,
      Comparator.NIN,
  ]

"""Subset of allowed logical operators."""
allowed_operators = [
    Operator.AND,
    Operator.OR
]
```
Translations from comparators/operators to MongoDB Atlas filter
operators(you can find the syntax in the "Atlas Vector Search
Pre-Filter" section from the previous link) are done using the following
dictionary:
```
map_dict = {
            Operator.AND: "$and",
            Operator.OR: "$or",
            Comparator.EQ: "$eq",
            Comparator.NE: "$ne",
            Comparator.GTE: "$gte",
            Comparator.LTE: "$lte",
            Comparator.LT: "$lt",
            Comparator.GT: "$gt",
            Comparator.IN: "$in",
            Comparator.NIN: "$nin",
        }
```

In visit_structured_query() the filters are passed as "pre_filter" and
not "filter" as in the MongoDB link above since langchain's
implementation of MongoDB atlas vector
store(libs\langchain\langchain\vectorstores\mongodb_atlas.py) in
_similarity_search_with_score() sets the "filter" key to have the value
of the "pre_filter" argument.
```
params["filter"] = pre_filter
```
Test cases and documentation have also been added.

# Issue
#11616 

# Dependencies
No new dependencies have been added.

# Documentation
I have created the notebook mongodb_atlas_self_query.ipynb outlining the
steps to get the self-query mechanism working.

I worked closely with [@Farhan-Faisal](https://github.com/Farhan-Faisal)
on this PR.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
Josef Zoller c2e3963da4
Merriam-Webster Dictionary Tool (#12044)
# Description

We implemented a simple tool for accessing the Merriam-Webster
Collegiate Dictionary API
(https://dictionaryapi.com/products/api-collegiate-dictionary).

Here's a simple usage example:

```py
from langchain.llms import OpenAI
from langchain.agents import load_tools, initialize_agent, AgentType

llm = OpenAI()
tools = load_tools(["serpapi", "merriam-webster"], llm=llm) # Serp API gives our agent access to Google
agent = initialize_agent(
  tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=True
)
agent.run("What is the english word for the german word Himbeere? Define that word.")
```

Sample output:

```
> Entering new AgentExecutor chain...
 I need to find the english word for Himbeere and then get the definition of that word.
Action: Search
Action Input: "English word for Himbeere"
Observation: {'type': 'translation_result'}
Thought: Now I have the english word, I can look up the definition.
Action: MerriamWebster
Action Input: raspberry
Observation: Definitions of 'raspberry':

1. rasp-ber-ry, noun: any of various usually black or red edible berries that are aggregate fruits consisting of numerous small drupes on a fleshy receptacle and that are usually rounder and smaller than the closely related blackberries
2. rasp-ber-ry, noun: a perennial plant (genus Rubus) of the rose family that bears raspberries
3. rasp-ber-ry, noun: a sound of contempt made by protruding the tongue between the lips and expelling air forcibly to produce a vibration; broadly : an expression of disapproval or contempt
4. black raspberry, noun: a raspberry (Rubus occidentalis) of eastern North America that has a purplish-black fruit and is the source of several cultivated varieties —called also blackcap

Thought: I now know the final answer.
Final Answer: Raspberry is an english word for Himbeere and it is defined as any of various usually black or red edible berries that are aggregate fruits consisting of numerous small drupes on a fleshy receptacle and that are usually rounder and smaller than the closely related blackberries.

> Finished chain.
```

# Issue

This closes #12039.

# Dependencies

We added no extra dependencies.

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Lara <63805048+larkgz@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
10 months ago
Mohammad Mohtashim f3dd4a10cf
DROP BOX Loader Documentation Update (#14047)
- **Description:** Update the document for drop box loader + made the
messages more verbose when loading pdf file since people were getting
confused
  - **Issue:** #13952
  - **Tag maintainer:** @baskaryan, @eyurtsev, @hwchase17,

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
10 months ago
Cheng (William) Huang a00db4b28f
Add multi-input Reddit search tool (#13893)
- **Description:** Added a tool called RedditSearchRun and an
accompanying API wrapper, which searches Reddit for posts with support
for time filtering, post sorting, query string and subreddit filtering.
  - **Issue:** #13891 
  - **Dependencies:** `praw` module is used to search Reddit
- **Tag maintainer:** @baskaryan , and any of the other maintainers if
needed
  - **Twitter handle:** None.

  Hello,

This is our first PR and we hope that our changes will be helpful to the
community. We have run `make format`, `make lint` and `make test`
locally before submitting the PR. To our knowledge, our changes do not
introduce any new errors.

Our PR integrates the `praw` package which is already used by
RedditPostsLoader in LangChain. Nonetheless, we have added integration
tests and edited unit tests to test our changes. An example notebook is
also provided. These changes were put together by me, @Anika2000,
@CharlesXu123, and @Jeremy-Cheng-stack

Thank you in advance to the maintainers for their time.

---------

Co-authored-by: What-Is-A-Username <49571870+What-Is-A-Username@users.noreply.github.com>
Co-authored-by: Anika2000 <anika.sultana@mail.utoronto.ca>
Co-authored-by: Jeremy Cheng <81793294+Jeremy-Cheng-stack@users.noreply.github.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
10 months ago
Jawad Arshad 00a6e8962c
langchain[minor]: Add serpapi tools (#13934)
- **Description:** Added some of the more endpoints supported by serpapi
that are not suported on langchain at the moment, like google trends,
google finance, google jobs, and google lens
- **Issue:** [Add support for many of the querying endpoints with
serpapi #11811](https://github.com/langchain-ai/langchain/issues/11811)

---------

Co-authored-by: zushenglu <58179949+zushenglu@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: Ian Xu <ian.xu@mail.utoronto.ca>
Co-authored-by: zushenglu <zushenglu1809@gmail.com>
Co-authored-by: KevinT928 <96837880+KevinT928@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
h3l dbaeb163aa
langchain[minor]: add volcengine endpoint as LLM (#13942)
- **Description:** Volc Engine MaaS serves as an enterprise-grade,
large-model service platform designed for developers. You can visit its
homepage at https://www.volcengine.com/docs/82379/1099455 for details.
This change will facilitate developers to integrate quickly with the
platform.
  - **Issue:** None
  - **Dependencies:** volcengine
  - **Tag maintainer:** @baskaryan 
  - **Twitter handle:** @he1v3tica

---------

Co-authored-by: lvzhong <lvzhong@bytedance.com>
10 months ago
Mohammad Ahmad 1600ebe6c7
langchain[patch]: Mask API key for ForeFrontAI LLM (#14013)
- **Description:** Mask API key for ForeFrontAI LLM and associated unit
tests
  - **Issue:** https://github.com/langchain-ai/langchain/issues/12165
  - **Dependencies:** N/A
  - **Tag maintainer:** @eyurtsev 
  - **Twitter handle:** `__mmahmad__`

I made the API key non-optional since linting required adding validation
for None, but the key is required per documentation:
https://python.langchain.com/docs/integrations/llms/forefrontai
10 months ago
yoch a0e859df51
langchain[patch]: fix cohere reranker init #12899 (#14029)
- **Description:** use post field validation for `CohereRerank`
  - **Issue:** #12899 and #13058
  - **Dependencies:** 
  - **Tag maintainer:** @baskaryan

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
123-fake-st 9bd6e9df36
update pdf document loaders' metadata source to url for online pdf (#13274)
- **Description:** Update 5 pdf document loaders in
`langchain.document_loaders.pdf`, to store a url in the metadata
(instead of a temporary, local file path) if the user provides a web
path to a pdf: `PyPDFium2Loader`, `PDFMinerLoader`,
`PDFMinerPDFasHTMLLoader`, `PyMuPDFLoader`, and `PDFPlumberLoader` were
updated.
- The updates follow the approach used to update `PyPDFLoader` for the
same behavior in #12092
- The `PyMuPDFLoader` changes required additional work in updating
`langchain.document_loaders.parsers.pdf.PyMuPDFParser` to be able to
process either an `io.BufferedReader` (from local pdf) or `io.BytesIO`
(from online pdf)
- The `PDFMinerPDFasHTMLLoader` change used a simpler approach since the
metadata is assigned by the loader and not the parser
  - **Issue:** Fixes #7034
  - **Dependencies:** None


```python
# PyPDFium2Loader example:
# old behavior
>>> from langchain.document_loaders import PyPDFium2Loader
>>> loader = PyPDFium2Loader('https://arxiv.org/pdf/1706.03762.pdf')
>>> docs = loader.load()
>>> docs[0].metadata
{'source': '/var/folders/7z/d5dt407n673drh1f5cm8spj40000gn/T/tmpm5oqa92f/tmp.pdf', 'page': 0}

# new behavior
>>> from langchain.document_loaders import PyPDFium2Loader
>>> loader = PyPDFium2Loader('https://arxiv.org/pdf/1706.03762.pdf')
>>> docs = loader.load()
>>> docs[0].metadata
{'source': 'https://arxiv.org/pdf/1706.03762.pdf', 'page': 0}
```
10 months ago
Toshish Jawale 6f64cb5078
Remove deprecated param and flexibility for prompt (#13310)
- **Description:** Updated to remove deprecated parameter penalty_alpha,
and use string variation of prompt rather than json object for better
flexibility. - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** N/A
  - **Tag maintainer:** @eyurtsev
  - **Twitter handle:** @symbldotai

---------

Co-authored-by: toshishjawale <toshish@symbl.ai>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
10 months ago
Tomaz Bratanic 3eb391561b
langchain[minor]: Reduce the number of tokens required to describe a Cypher/Neo4j schema (#13851)
Instead of using JSON-like syntax to describe node and relationship
properties we changed to a shorter and more concise schema description

Old:

```
        Node properties are the following:
        [{'properties': [{'property': 'name', 'type': 'STRING'}], 'labels': 'Movie'}, {'properties': [{'property': 'name', 'type': 'STRING'}], 'labels': 'Actor'}]
        Relationship properties are the following:
        []
        The relationships are the following:
        ['(:Actor)-[:ACTED_IN]->(:Movie)']
```

New:

```
Node properties are the following:
Movie {name: STRING},Actor {name: STRING}
Relationship properties are the following:

The relationships are the following:
(:Actor)-[:ACTED_IN]->(:Movie)
```
10 months ago
Sauhaard 7ec4dbeb80
langchain[minor]: Add StackExchange API integration (#14002)
Implements
[#12115](https://github.com/langchain-ai/langchain/issues/12115)

Who can review?
@baskaryan , @eyurtsev , @hwchase17 

Integrated Stack Exchange API into Langchain, enabling access to diverse
communities within the platform. This addition enhances Langchain's
capabilities by allowing users to query Stack Exchange for specialized
information and engage in discussions. The integration provides seamless
interaction with Stack Exchange content, offering content from varied
knowledge repositories.

A notebook example and test cases were included to demonstrate the
functionality and reliability of this integration.

- Add StackExchange as a tool.
- Add unit test for the StackExchange wrapper and tool.
- Add documentation for the StackExchange wrapper and tool.

If you have time, could you please review the code and provide any
feedback as necessary! My team is welcome to any suggestions.

---------

Co-authored-by: Yuval Kamani <yuvalkamani@gmail.com>
Co-authored-by: Aryan Thakur <aryanthakur@Aryans-MacBook-Pro.local>
Co-authored-by: Manas1818 <79381912+manas1818@users.noreply.github.com>
Co-authored-by: aryan-thakur <61063777+aryan-thakur@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
Bagatur d4405bc94e
langchain[patch]: Release 0.0.343 (#14037) 10 months ago
Yves Zumbühl 9c0ad0cebb
langchain[patch]: Improve HyDe with custom prompts and ability to supply the run_manager (#14016)
- **Description:** The class allows to only select between a few
predefined prompts from the paper. That is not ideal, since other use
cases might need a custom prompt. The changes made allow for this. To be
able to monitor those, I also added functionality to supply a custom
run_manager.
  - **Issue:** no issue, but a new feature,
  - **Dependencies:** none,
  - **Tag maintainer:** @hwchase17,
  - **Twitter handle:** @yvesloy

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
Chad Norvell 1c4bfb8c5f
langchain[patch]: Mathpix PDF loader supports arbitrary extra params (#13950)
- **Description:** Support providing whatever extra parameters you want
to the Mathpix PDF loader API request.
  - **Issue:** #12773
  - **Dependencies:** None

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
10 months ago
Unai Garay Maestre 9e2ae866c4
langchain[patch]: Adds progress bar to GooglePalmEmbeddings (#13812)
- **Description:** Adds a tqdm progress bar to GooglePalmEmbeddings when
embedding a list.
  - **Issue:** #13637
  - **Dependencies:** TQDM as a main dependency (instead of extra)


Signed-off-by: ugm2 <unaigaraymaestre@gmail.com>

---------

Signed-off-by: ugm2 <unaigaraymaestre@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
10 months ago