Commit Graph

2113 Commits

Author SHA1 Message Date
kavinraj A S
ab6b41937a
Fixed a typo in smart_llm prompt (#13052)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-05 19:16:18 -08:00
jeffpezzone
7c2ef06136
Adds "NIN" metadata filter for pgvector to all checking for set absence (#14205)
This PR adds support for metadata filters of the form:

`{"filter": {"key": { "NIN" : ["list", "of", "values"]}}}`

"IN" is already supported, so this is a quick & related update to add
"NIN"
2023-12-05 19:07:33 -08:00
lif
20d2b4a6ba
feat: Increased compatibility with new and old versions for dalle (#14222)
- **Description:** Increased compatibility with all versions openai for
dalle,

This pr add support for openai version from 0 ~ 1.3.
2023-12-05 17:31:28 -08:00
Wang Wei
7205bfdd00
feat: 1. Add system parameters, 2. Align with the QianfanChatEndpoint for function calling (#14275)
- **Description:** 
1. Add system parameters to the ERNIE LLM API to set the role of the
LLM.
2. Add support for the ERNIE-Bot-turbo-AI model according from the
document https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Alp0kdm0n.
3. For the function call of ErnieBotChat, align with the
QianfanChatEndpoint.

With this PR, the `QianfanChatEndpoint()` can use the `function calling`
ability with `create_ernie_fn_chain()`. The example is as the following:

```
from langchain.prompts import ChatPromptTemplate
import json
from langchain.prompts.chat import (
    ChatPromptTemplate,
)

from langchain.chat_models import QianfanChatEndpoint
from langchain.chains.ernie_functions import (
    create_ernie_fn_chain,
)

def get_current_news(location: str) -> str:
    """Get the current news based on the location.'

    Args:
        location (str): The location to query.
    
    Returs:
        str: Current news based on the location.
    """

    news_info = {
        "location": location,
        "news": [
            "I have a Book.",
            "It's a nice day, today."
        ]
    }

    return json.dumps(news_info)

def get_current_weather(location: str, unit: str="celsius") -> str:
    """Get the current weather in a given location

    Args:
        location (str): location of the weather.
        unit (str): unit of the tempuature.
    
    Returns:
        str: weather in the given location.
    """

    weather_info = {
        "location": location,
        "temperature": "27",
        "unit": unit,
        "forecast": ["sunny", "windy"],
    }
    return json.dumps(weather_info)

template = ChatPromptTemplate.from_messages([
    ("user", "{user_input}"),
])

chat = QianfanChatEndpoint(model="ERNIE-Bot-4")
chain = create_ernie_fn_chain([get_current_weather, get_current_news], chat, template, verbose=True)
res = chain.run("北京今天的新闻是什么?")
print(res)
```

The result of the above code:
```
> Entering new LLMChain chain...
Prompt after formatting:
Human: 北京今天的新闻是什么?
> Finished chain.
{'name': 'get_current_news', 'arguments': {'location': '北京'}}
```

For the `ErnieBotChat`, now can use the `system` parameter to set the
role of the LLM.

```
from langchain.prompts import ChatPromptTemplate
from langchain.chains import LLMChain
from langchain.chat_models import ErnieBotChat

llm = ErnieBotChat(model_name="ERNIE-Bot-turbo-AI", system="你是一个能力很强的机器人,你的名字叫 小叮当。无论问你什么问题,你都可以给出答案。")
prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{query}"),
    ]
)
chain = LLMChain(llm=llm, prompt=prompt, verbose=True)
res = chain.run(query="你是谁?")
print(res)
```

The result of the above code:

```
> Entering new LLMChain chain...
Prompt after formatting:
Human: 你是谁?
> Finished chain.
我是小叮当,一个智能机器人。我可以为你提供各种服务,包括回答问题、提供信息、进行计算等。如果你需要任何帮助,请随时告诉我,我会尽力为你提供最好的服务。
```
2023-12-05 17:28:31 -08:00
Leonid Kuligin
fd5be55a7b
added get_num_tokens to GooglePalm (#14282)
added get_num_tokens to GooglePalm + a little bit of refactoring
2023-12-05 17:24:19 -08:00
Massimiliano Pronesti
c215a4c9ec
feat(embeddings): text-embeddings-inference (#14288)
- **Description:** Added a notebook to illustrate how to use
`text-embeddings-inference` from huggingface. As
`HuggingFaceHubEmbeddings` was using a deprecated client, I made the
most of this PR updating that too.

- **Issue:** #13286 

- **Dependencies**: None

- **Tag maintainer:** @baskaryan
2023-12-05 17:22:05 -08:00
Tim Van Wassenhove
85b88c33f3
Fixes issue-14295: Correctly pass along the kwargs (#14296)
- **Description:** Update code to correctly pass the kwargs 
  - **Issue:** #14295 
  - **Dependencies:**  - 
  - **Tag maintainer:** 

<--
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

#issue-14295
2023-12-05 17:14:00 -08:00
Jarkko Lagus
667ad6a5de
Add support for CORS options for AzureSearch (#14305)
- **Description:** Add support for setting the CORS options when using
AzureSearch indexes
2023-12-05 16:05:40 -08:00
Karim Assi
9401539e43
Allow not enforcing function usage when a single function is passed to openai function executable (#14308)
- **Description:** allows not enforcing function usage when a single
function is passed to an openAI function executable (or corresponding
legacy chain). This is a desired feature in the case where the model
does not have enough information to call a function, and needs to get
back to the user.
  - **Issue:** N/A
  - **Dependencies:** N/A
  - **Tag maintainer:** N/A
2023-12-05 15:56:31 -08:00
Ran
d22c13ec48
Mask API key for Minimax LLM (#14309)
- **Description:** Added masking for the API key for Minimax LLM + tests
inspired by https://github.com/langchain-ai/langchain/pull/12418.
- **Issue:** the issue # fixes
https://github.com/langchain-ai/langchain/issues/12165
- **Dependencies:** this fix is dependent on Minimax instantiation fix
which is introduced in
https://github.com/langchain-ai/langchain/pull/13439, so merge this one
after.
  - **Tag maintainer:** @eyurtsev

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-05 15:42:00 -08:00
Eugene Yurtsev
a74c03da3c
Add metadata to blob (#14162)
Add metadata to the blob object. This makes it easier
to make a pipeline that properly propagates metadata information
from raw content to the derived content.
2023-12-05 17:17:41 -05:00
Lance Martin
66848871fc
Multi-modal RAG template (#14186)
* OpenCLIP embeddings
* GPT-4V

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-05 13:36:38 -08:00
James Braza
3b75d37cee
Adding BaseChatMessageHistory.__str__ (#14311)
Adding __str__ to base chat message history to make it easier to debug
2023-12-05 16:22:31 -05:00
James Braza
8b0060184d
Fixing empty input variable crashing PromptTemplate validations (#14314)
- Fixes `input_variables=[""]` crashing validations with a template
`"{}"`
- Uses `__cause__` for proper `Exception` chaining in
`check_valid_template`
2023-12-05 13:13:08 -08:00
Bagatur
6607cc6eab
experimental[patch]: Release 0.0.44 (#14310) 2023-12-05 12:11:42 -08:00
Eugene Yurtsev
80637727ea
hide api key: arcee (#14304)
Hide API key for Arcee

---------

Co-authored-by: raphael <raph.nunes95@gmail.com>
2023-12-05 14:49:55 -05:00
Bagatur
b2e756c0a8
langchain[patch]: Release 0.0.346 (#14307) 2023-12-05 11:38:52 -08:00
Bagatur
4a5a13aab3
core[patch]: Release 0.0.10 (#14303) 2023-12-05 10:20:57 -08:00
Eun Hye Kim
f758c8adc4
Fix #11737 issue (extra_tools option of create_pandas_dataframe_agent is not working) (#13203)
- **Description:** Fix #11737 issue (extra_tools option of
create_pandas_dataframe_agent is not working),
  - **Issue:** #11737 ,
  - **Dependencies:** no,
- **Tag maintainer:** @baskaryan, @eyurtsev, @hwchase17 I needed this
method at work, so I modified it myself and used it. There is a similar
issue(#11737) and PR(#13018) of @PyroGenesis, so I combined my code at
the original PR.
You may be busy, but it would be great help for me if you checked. Thank
you.
  - **Twitter handle:** @lunara_x 

If you need an .ipynb example about this, please tag me. 
I will share what I am working on after removing any work-related
content.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-04 20:54:08 -08:00
Sean Bearden
77a15fa988
Added ability to pass arguments to the Playwright browser (#13146)
- **Description:** Enhanced `create_sync_playwright_browser` and
`create_async_playwright_browser` functions to accept a list of
arguments. These arguments are now forwarded to
`browser.chromium.launch()` for customizable browser instantiation.
  - **Issue:** #13143
  - **Dependencies:** None
  - **Tag maintainer:** @eyurtsev,
  - **Twitter handle:** Dr_Bearden

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-04 20:48:09 -08:00
Joan Fontanals
dcccf8fa66
adapt Jina Embeddings to new Jina AI Embedding API (#13658)
- **Description:** Adapt JinaEmbeddings to run with the new Jina AI
Embedding platform
- **Twitter handle:** https://twitter.com/JinaAI_

---------

Co-authored-by: Joan Fontanals Martinez <joan.fontanals.martinez@jina.ai>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-04 20:40:33 -08:00
guillaumedelande
ea0afd07ca
Update azuresearch.py following recent change from azure-search-documents library (#13472)
- **Description:** 

Reference library azure-search-documents has been adapted in version
11.4.0:

1. Notebook explaining Azure AI Search updated with most recent info
2. HnswVectorSearchAlgorithmConfiguration --> HnswAlgorithmConfiguration
3. PrioritizedFields(prioritized_content_fields) -->
SemanticPrioritizedFields(content_fields)
4. SemanticSettings --> SemanticSearch
5. VectorSearch(algorithm_configurations) -->
VectorSearch(configurations)

--> Changes now reflected on Langchain: default vector search config
from langchain is now compatible with officially released library from
Azure.

  - **Issue:**
Issue creating a new index (due to wrong class used for default vector
search configuration) if using latest version of azure-search-documents
with current langchain version
  - **Dependencies:** azure-search-documents>=11.4.0,
  - **Tag maintainer:** ,

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-04 20:29:20 -08:00
price-deshaw
5cb3393e20
update OpenAI function agents' llm validation (#13538)
- **Description:** This PR modifies the LLM validation in OpenAI
function agents to check whether the LLM supports OpenAI functions based
on a property (`supports_oia_functions`) instead of whether the LLM
passed to the agent `isinstance` of `ChatOpenAI`. This allows classes
that extend `BaseChatModel` to be passed to these agents as long as
they've been integrated with the OpenAI APIs and have this property set,
even if they don't extend `ChatOpenAI`.
  - **Issue:** N/A
  - **Dependencies:** none
2023-12-04 20:28:13 -08:00
Max Weng
74c7b799ef
migrate openai audio api (#13557)
for issue https://github.com/langchain-ai/langchain/issues/13162
migrate openai audio api, as [openai v1.0.0 Migration
Guide](https://github.com/openai/openai-python/discussions/742)

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Double Max <max@ground-map.com>
2023-12-04 20:27:54 -08:00
Arnaud Gelas
abbba6c7d8
openapi/planner.py: Deal with json in markdown output cases (#13576)
- **Description:** In openapi/planner deal with json in markdown output
cases
- **Issue:** In some cases LLMs could return json in markdown which
can't be loaded.
  - **Dependencies:**
  - **Tag maintainer:** @eyurtsev
  - **Twitter handle:**

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-04 20:27:22 -08:00
Harrison Chase
8eab4d95c0
Harrison/delegate from template (#14266)
Co-authored-by: M.R. Sopacua <144725145+msopacua@users.noreply.github.com>
2023-12-04 20:18:15 -08:00
Nolan
b49104c2c9
Add missing doc key to metadata field in AzureSearch Vectorstore (#13328)
- **Description:** Adds doc key to metadata field when adding document
to Azure Search.
  - **Issue:** -,
  - **Dependencies:** -,
  - **Tag maintainer:** @eyurtsev,
  - **Twitter handle:** @finnless

Right now the document key with the name FIELDS_ID is not included in
the FIELDS_METADATA field, and therefore is not included in the Document
returned from a query. This is really annoying if you want to be able to
modify that item in the vectorstore.

Other's thoughts on this are welcome.
2023-12-04 19:53:27 -08:00
Jon Watte
e042e5df35
fix: call _on_llm_error() (#13581)
Description: There's a copy-paste typo where on_llm_error() calls
_on_chain_error() instead of _on_llm_error().
Issue: #13580 
Dependencies: None
Tag maintainer: @hwchase17 
Twitter handle: @jwatte

"Run `make format`, `make lint` and `make test` to check this locally."
The test scripts don't work in a plain Ubuntu LTS 20.04 system.
It looks like the dev container pulling is stuck. Or maybe the internet
is just ornery today.

---------

Co-authored-by: jwatte <jwatte@observeinc.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-04 19:44:50 -08:00
Hamza Ahmed
fcc8e5e839
Update geodataframe.py (#13573)
here it is validating shapely.geometry.point.Point: if not
isinstance(data_frame[page_content_column].iloc[0], gpd.GeoSeries):
raise ValueError(
f"Expected data_frame[{page_content_column}] to be a GeoSeries" you need
it to validate the geoSeries and not the shapely.geometry.point.Point

if not isinstance(data_frame[page_content_column], gpd.GeoSeries):
            raise ValueError(
f"Expected data_frame[{page_content_column}] to be a GeoSeries"

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-04 19:44:30 -08:00
Harrison Chase
2213fc9711
Harrison/bookend ai (#14258)
Co-authored-by: stvhu-bookend <142813359+stvhu-bookend@users.noreply.github.com>
2023-12-04 19:42:15 -08:00
cxumol
0d47d15a9f
add(feat): Text Embeddings by Cloudflare Workers AI (#14220)
Add [Text Embeddings by Cloudflare Workers
AI](https://developers.cloudflare.com/workers-ai/models/text-embeddings/).
It's a new integration.
Trying to align it with its langchain-js version counterpart
[here](https://api.js.langchain.com/classes/embeddings_cloudflare_workersai.CloudflareWorkersAIEmbeddings.html).
- Dependencies: N/A
- Done `make format` `make lint` `make spell_check` `make
integration_tests` and all my changes was passed
2023-12-04 19:25:05 -08:00
Harrison Chase
c51001f01e
fix comet tracer (#14259) 2023-12-04 19:03:19 -08:00
Harrison Chase
4fb72ff76f
fake consistent embeddings cleanup (#14256)
delete code that could never be reached
2023-12-04 16:55:30 -08:00
Michael Landis
e26906c1dc
feat: implement max marginal relevance for momento vector index (#13619)
**Description**

Implements `max_marginal_relevance_search` and
`max_marginal_relevance_search_by_vector` for the Momento Vector Index
vectorstore.

Additionally bumps the `momento` dependency in the lock file and adds
logging to the implementation.

**Dependencies**

 updates `momento` dependency in lock file

**Tag maintainer**

@baskaryan 

**Twitter handle**

Please tag @momentohq for Momento Vector Index and @mloml for the
contribution 🙇

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
2023-12-04 16:50:23 -08:00
deedy5
ee9abb6722
Bugfix duckduckgo_search news search (#13670)
- **Description:** 
Bugfix duckduckgo_search news search
  - **Issue:** 
https://github.com/langchain-ai/langchain/issues/13648
  - **Dependencies:** 
None
  - **Tag maintainer:** 
@baskaryan

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-04 16:48:20 -08:00
Aliaksandr Kuzmik
676a077c4e
Add CometTracer (#13661)
Hi! I'm Alex, Python SDK Team Lead from
[Comet](https://www.comet.com/site/).

This PR contains our new integration between langchain and Comet -
`CometTracer` class which uses new `comet_llm` python package for
submitting data to Comet.

No additional dependencies for the langchain package are required
directly, but if the user wants to use `CometTracer`, `comet-llm>=2.0.0`
should be installed. Otherwise an exception will be raised from
`CometTracer.__init__`.

A test for the feature is included.

There is also an already existing callback (and .ipynb file with
example) which ideally should be deprecated in favor of a new tracer. I
wasn't sure how exactly you'd prefer to do it. For example we could open
a separate PR for that.

I'm open to your ideas :)
2023-12-04 16:46:48 -08:00
Harrison Chase
921c4b5597
Harrison/searchapi (#14252)
Co-authored-by: SebastjanPrachovskij <86522260+SebastjanPrachovskij@users.noreply.github.com>
2023-12-04 16:34:15 -08:00
Colin Ulin
9f9cb71d26
Embaas - added backoff retries for network requests (#13679)
Running a large number of requests to Embaas' servers (or any server)
can result in intermittent network failures (both from local and
external network/service issues). This PR implements exponential backoff
retries to help mitigate this issue.
2023-12-04 16:21:35 -08:00
Kastan Day
65faba91ad
langchain[patch]: Adding new Github functions for reading pull requests (#9027)
The Github utilities are fantastic, so I'm adding support for deeper
interaction with pull requests. Agents should read "regular" comments
and review comments, and the content of PR files (with summarization or
`ctags` abbreviations).

Progress:
- [x] Add functions to read pull requests and the full content of
modified files.
- [x] Function to use Github's built in code / issues search.

Out of scope:
- Smarter summarization of file contents of large pull requests (`tree`
output, or ctags).
- Smarter functions to checkout PRs and edit the files incrementally
before bulk committing all changes.
- Docs example for creating two agents:
- One watches issues: For every new issue, open a PR with your best
attempt at fixing it.
- The other watches PRs: For every new PR && every new comment on a PR,
check the status and try to finish the job.

<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-04 15:53:36 -08:00
Hynek Kydlíček
aa8ae31e5b
core[patch]: add response kwarg to on_llm_error
# Dependencies
None

# Twitter handle
@HKydlicek

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-12-04 15:04:48 -08:00
Jacob Lee
a26c4a0930
Allow base_store to be used directly with MultiVectorRetriever (#14202)
Allow users to pass a generic `BaseStore[str, bytes]` to
MultiVectorRetriever, removing the need to use the `create_kv_docstore`
method. This encoding will now happen internally.

@rlancemartin @eyurtsev

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-12-04 14:43:32 -08:00
Vincent Brouwers
67662564f3
langchain[patch]: Fix config arg detection for wrapped lambdarunnable (#14230)
**Description:**
When a RunnableLambda only receives a synchronous callback, this
callback is wrapped into an async one since #13408. However, this
wrapping with `(*args, **kwargs)` causes the `accepts_config` check at
[/libs/core/langchain_core/runnables/config.py#L342](ee94ef55ee/libs/core/langchain_core/runnables/config.py (L342))
to fail, as this checks for the presence of a "config" argument in the
method signature.

Adding a `functools.wraps` around it, resolves it.
2023-12-04 14:18:30 -08:00
Jacob Lee
de86b84a70
Prefer byte store interface for Upstash BaseStore to match other Redis (#14201)
If we are not going to make the existing Docstore class also implement
`BaseStore[str, Document]`, IMO all base store implementations should
always be `[str, bytes]` so that they are more interchangeable.

CC @rlancemartin @eyurtsev
2023-12-04 14:17:33 -08:00
Harrison Chase
411aa9a41e
Harrison/nasa tool (#14245)
Co-authored-by: Jacob Matias <88005863+matiasjacob25@users.noreply.github.com>
Co-authored-by: Karam Daid <karam.daid@mail.utoronto.ca>
Co-authored-by: Jumana <jumana.fanous@mail.utoronto.ca>
Co-authored-by: KaramDaid <38271127+KaramDaid@users.noreply.github.com>
Co-authored-by: Anna Chester <74325334+CodeMakesMeSmile@users.noreply.github.com>
Co-authored-by: Jumana <144748640+jfanous@users.noreply.github.com>
2023-12-04 13:43:11 -08:00
nceccarelli
5fea63327b
Support Azure gov cloud in Azure Cognitive Search retriever (#13695)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
- **Description:** The existing version hardcoded search.windows.net in
the base url. This is not compatible with the gov cloud. I am allowing
the user to override the default for gov cloud support.,
  - **Issue:** N/A, did not write up in an issue,
  - **Dependencies:** None

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Nicholas Ceccarelli <nceccarelli2@moog.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-04 12:56:35 -08:00
ealt
e09b876863
Fixes error loading Obsidian templates (#13888)
- **Description:** Obsidian templates can include
[variables](https://help.obsidian.md/Plugins/Templates#Template+variables)
using double curly braces. `ObsidianLoader` uses PyYaml to parse the
frontmatter of documents. This parsing throws an error when encountering
variables' curly braces. This is avoided by temporarily substituting
safe strings before parsing.
  - **Issue:** #13887
  - **Tag maintainer:** @hwchase17
2023-12-04 12:55:37 -08:00
Nithish Raghunandanan
eecfa3f9e5
Add Couchbase document loader (#13979)
**Description:** 
Adds the document loader for [Couchbase](http://couchbase.com/), a
distributed NoSQL database.
**Dependencies:** 
Added the Couchbase SDK as an optional dependency.
**Twitter handle:** nithishr

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-04 12:28:12 -08:00
Muntaqa Mahmood
25f72944a0
Add: Steam API tool (#14008)
- **Description:** Our PR is an integration of a Steam API Tool that
makes recommendations on steam games based on user's Steam profile and
provides information on games based on user provided queries.
- **Issue:** the issue # our PR implements:
https://github.com/langchain-ai/langchain/issues/12120
- **Dependencies:** python-steam-api library, steamspypi library and
decouple library
  - **Tag maintainer:** @baskaryan, @hwchase17 
  - **Twitter handle:** N/A

Hello langchain Maintainers,

We are a team of 4 University of Toronto students contributing to
langchain as part of our course [CSCD01 (link to course
page)](https://cscd01.com/work/open-source-project). We hope our changes
help the community. We have run make format, make lint and make test
locally before submitting the PR. To our knowledge, our changes do not
introduce any new errors.

Our PR integrates the python-steam-api, steamspypi and decouple
packages. We have added integration tests to test our python API
integration into langchain and an example notebook is also provided.

Our amazing team that contributed to this PR: @JohnY2002, @shenceyang,
@andrewqian2001 and @muntaqamahmood

Thank you in advance to all the maintainers for reviewing our PR!

---------

Co-authored-by: Shence <ysc1412799032@163.com>
Co-authored-by: JohnY2002 <johnyuan0526@gmail.com>
Co-authored-by: Andrew Qian <andrewqian2001@gmail.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: JohnY <94477598+JohnY2002@users.noreply.github.com>
2023-12-04 12:27:38 -08:00
Bob Lin
cd2028288e
Add openai v2 adapter (#14063)
### Description

Starting from [openai version
1.0.0](17ac677995 (module-level-client)),
the camel case form of `openai.ChatCompletion` is no longer supported
and has been changed to lowercase `openai.chat.completions`. In
addition, the returned object only accepts attribute access instead of
index access:

```python
import openai

# optional; defaults to `os.environ['OPENAI_API_KEY']`
openai.api_key = '...'

# all client options can be configured just like the `OpenAI` instantiation counterpart
openai.base_url = "https://..."
openai.default_headers = {"x-foo": "true"}

completion = openai.chat.completions.create(
    model="gpt-4",
    messages=[
        {
            "role": "user",
            "content": "How do I output all files in a directory using Python?",
        },
    ],
)
print(completion.choices[0].message.content)
```

So I implemented a compatible adapter that supports both attribute
access and index access:

```python
In [1]: from langchain.adapters import openai as lc_openai
   ...: messages = [{"role": "user", "content": "hi"}]

In [2]: result = lc_openai.chat.completions.create(
   ...:     messages=messages, model="gpt-3.5-turbo", temperature=0
   ...: )

In [3]: result.choices[0].message
Out[3]: {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}

In [4]: result["choices"][0]["message"]
Out[4]: {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}

In [5]: result = await lc_openai.chat.completions.acreate(
   ...:     messages=messages, model="gpt-3.5-turbo", temperature=0
   ...: )

In [6]: result.choices[0].message
Out[6]: {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}

In [7]: result["choices"][0]["message"]
Out[7]: {'role': 'assistant', 'content': 'Hello! How can I assist you today?'}

In [8]: for rs in lc_openai.chat.completions.create(
    ...:     messages=messages, model="gpt-3.5-turbo", temperature=0, stream=True
    ...: ):
    ...:     print(rs.choices[0].delta)
    ...:     print(rs["choices"][0]["delta"])
    ...:
{'role': 'assistant', 'content': ''}
{'role': 'assistant', 'content': ''}
{'content': 'Hello'}
{'content': 'Hello'}
{'content': '!'}
{'content': '!'}

In [20]: async for rs in await lc_openai.chat.completions.acreate(
    ...:     messages=messages, model="gpt-3.5-turbo", temperature=0, stream=True
    ...: ):
    ...:     print(rs.choices[0].delta)
    ...:     print(rs["choices"][0]["delta"])
    ...:
{'role': 'assistant', 'content': ''}
{'role': 'assistant', 'content': ''}
{'content': 'Hello'}
{'content': 'Hello'}
{'content': '!'}
{'content': '!'}
...
```

### Twitter handle

[lin_bob57617](https://twitter.com/lin_bob57617)
2023-12-04 12:12:30 -08:00
billytrend-cohere
0f02081392
Add input_type override (#14068)
Add option to override input_type for cohere's v3 embeddings models

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-04 12:10:24 -08:00
Dmitrii Rashchenko
aaabc1574f
Support of custom hugging face inference endpoints url (#14125)
- **Description:** to support not only publicly available Hugging Face
endpoints, but also protected ones (created with "Inference Endpoints"
Hugging Face feature), I have added ability to specify custom api_url.
But if not specified, default behaviour won't change
  - **Issue:** #9181,
  - **Dependencies:** no extra dependencies
2023-12-04 12:08:51 -08:00
Harrison Chase
e32185193e
Harrison/embass (#14242)
Co-authored-by: Julius Lipp <lipp.julius@gmail.com>
2023-12-04 11:58:52 -08:00
umair mehmood
8504ec56e4
fixed: ModuleNotFoundError: No module named 'clarifai.auth' (#14215)
Updated the clarifai imports 

fixed: #14175 

@efriis 
@baskaryan
2023-12-04 11:53:34 -08:00
Hieu Lam
ca8a022cd9
Fixed OpenAIFunctionsAgent not returning when receiving AgentFinish (#14236)
**Description:** The way the condition is checked in the
`return_stopped_response` function of `OpenAIAgent` may not be correct,
when the value returned is `AgentFinish` from the tools it does not work
properly.


Thanks for review, @baskaryan, @eyurtsev, @hwchase17.
2023-12-04 11:43:04 -08:00
Unai Garay Maestre
6826feea14
Adds llm_chain_kwargs to BaseRetrievalQA.from_llm (#14224)
- **Description:** Adds `llm_chain_kwargs` to `BaseRetrievalQA.from_llm`
so these can be passed to the LLM at runtime,
- **Issue:** https://github.com/langchain-ai/langchain/issues/14216,

---------

Signed-off-by: ugm2 <unaigaraymaestre@gmail.com>
2023-12-04 11:34:01 -08:00
James Braza
6ce5dab38c
Clarifying descriptions in GuardrailsOutputParser (#14228)
Upstreaming knowledge from
https://github.com/guardrails-ai/guardrails/discussions/473 to LangChain
2023-12-04 11:33:22 -08:00
geret1
50aee687c6
langchain[patch]: Cerebrium model_api_request deprecation (#12704)
- **Description:** As part of my conversation with Cerebrium team,
`model_api_request` will be no longer available in cerebrium lib so it
needs to be replaced.
  - **Issue:** #12705 12705,
  - **Dependencies:** Cerebrium team (agreed)
  - **Tag maintainer:** @eyurtsev 
  - **Twitter handle:** No official Twitter account sorry :D

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-04 09:26:32 -08:00
William FH
246dc4f9cc
langchain[patch]: Pass kwargs to chat fireworks (#14183)
Otherwise `.bind()` isn't really any good
2023-12-03 15:12:02 -08:00
Kaiboon Ee
e961c57fd2
langchain[patch]: Mask API key for Arcee LLM (#14193)
- **Description:** Mask API key for Arcee LLM and its associated unit
tests
  - **Issue:** https://github.com/langchain-ai/langchain/issues/12165
  - **Dependencies:** N/A
  - **Tag maintainer:** @eyurtsev
  - **Twitter handle:** `eekaiboon`

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-03 15:11:43 -08:00
Daniyar Supiyev
092f302c0f
langchain[patch]: Asynchronous human-in-the-loop callback (#14195)
**Description:** Adding a possibility to use asynchronous callback
handler in human-in-the-loop validation tool. Very useful, for example,
if you want to implement a validation over Telegram bot.
**Issue:** -
**Dependencies:** -

---------

Co-authored-by: Daniyar_Supiyev <daniyar_supiyev@epam.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-03 14:57:07 -08:00
Mark Cusack
16c83f786c
Adds the Yellowbrick Data Warehouse as a supported vector store (#13820)
- **Description** An integration to allow the Yellowbrick Data Warehouse
to function as a vector store

---------

Co-authored-by: markcusack <markcusack@markcusacksmac.lan>
Co-authored-by: markcusack <markcusack@Mark-Cusack-sMac.local>
2023-12-03 13:35:53 -08:00
Hendrik Hogertz
e6862e6e7d
Fix Azure Openai function calling in streaming mode (#13768)
- **Description**: This PR addresses an issue with the OpenAI API
streaming response, where initially the key (arguments) is provided but
the value is None. Subsequently, it updates with {"arguments": "{\n"},
leading to a type inconsistency that causes an exception. The specific
error encountered is ValueError: additional_kwargs["arguments"] already
exists in this message, but with a different type. This change aims to
resolve this inconsistency and ensure smooth API interactions.
- **Issue**: None.
- **Dependencies**: None.
- **Tag maintainer**: @eyurtsev

This is an updated version of #13229 based on the refactored code.
Credit goes to @superken01.

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-03 12:07:15 -08:00
Nicolò Boschi
e204657b3c
AstraDB VectorStore: implement pre_delete_collection (#13780)
- **Description:** some vector stores have a flag for try deleting the
collection before creating it (such as ´vectorpg´). This is a useful
flag when prototyping indexing pipelines and also for integration tests.
Added the bool flag `pre_delete_collection ` to the constructor (default
False)
  - **Tag maintainer:** @hemidactylus 
  - **Twitter handle:** nicoloboschi

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-03 12:06:20 -08:00
Chelsea E. Manning
2780d2d4dd
Extend OpenAIEmbeddings class to support non-tiktoken based embeddings (#13884)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
- **Description:** This extends `OpenAIEmbeddings` to add support for
non-`tiktoken` based embeddings, specifically for use with the new
`text-generation-webui` API (`--extensions openai`) which does not
support `tiktoken` encodings, but rather strings
  - **Issue:** Not found,
- **Dependencies:** HuggingFace `transformers.AutoTokenizer` is new
dependency for running the model without `tiktoken`
- **Tag maintainer:** @baskaryan based on last commit for
`langchain-core` refactor
  - **Twitter handle:** @xychelsea

Modified the tokenization process to be model-agnostic, allowing for
both OpenAI and non-OpenAI model tokenizations, by setting the new
default `bool` flag `tiktoken_enabled` to `False`. This requeires
HuggingFace’s AutoTokenizer and handling tokenization for models
requiring different preprocessing steps to generate a chunked string
request rather than a list of integers.

Updated the embeddings generation process to accommodate non-OpenAI
models. This includes converting tokenized text into embeddings using
OpenAI’s and Hugging Face’s model architectures.
 -->
2023-12-03 12:04:17 -08:00
Changgeng Zhao
9b59bde93d
Update Hologres vector store: use hologres-vector (#13767)
Hi,
I made some code changes on the Hologres vector store to improve the
data insertion performance.
Also, this version of the code uses `hologres-vector` library. This
library is more convenient for us to update, and more efficient in
performance.
The code has passed the format/lint/spell check. I have run the unit
test for Hologres connecting to my own database.
Please check this PR again and tell me if anything needs to change.

Best,
Changgeng,
Developer @ Alibaba Cloud

Co-authored-by: Changgeng Zhao <zhaochanggeng.zcg@alibaba-inc.com>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-03 11:50:45 -08:00
Nicolò Boschi
0de7cf898d
Ensure AstraDB integration tests clean up the environment (#13774)
- **Description:** currently astra_db integration tests might leave
orphan collections
  - **Tag maintainer:** @hemidactylus 
  - **Twitter handle:** nicoloboschi
2023-12-03 11:14:42 -08:00
Chad Norvell
8a0951d934
Fix Mathpix PDF loader integration (#13949)
- **Description:** Fixes the Mathpix PDF loader API integration.
Specifically, ensures that Mathpix auth headers are provided for every
request, and ensures that we recognize all errors that can occur during
a request. Also, the option to provide API keys as kwargs never actually
worked before, but now that's fixed too.
  - **Issue:** #11249
  - **Dependencies:** None
2023-12-03 10:36:49 -08:00
gzyJoy
32d4bb4590
Added Slacktoolkit (#14012)
- **Description:** 
This PR introduces the Slack toolkit to LangChain, which allows users to
read and write to Slack using the Slack API. Specifically, we've added
the following tools.
1. get_channel: Provides a summary of all the channels in a workspace.
2. get_message: Gets the message history of a channel.
3. send_message: Sends a message to a channel.
4. schedule_message: Sends a message to a channel at a specific time and
date.

- **Issue:** This pull request addresses [Add Slack Toolkit
#11747](https://github.com/langchain-ai/langchain/issues/11747)
  - **Dependencies:** package`slack_sdk`
Note: For this toolkit to function you will need to add a Slack app to
your workspace. Additional info can be found
[here](https://slack.com/help/articles/202035138-Add-apps-to-your-Slack-workspace).

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ArianneLavada <ariannelavada@gmail.com>
Co-authored-by: ArianneLavada <84357335+ArianneLavada@users.noreply.github.com>
Co-authored-by: ariannelavada@gmail.com <you@example.com>
2023-12-03 10:25:38 -08:00
Richie
99e5ee6a84
fix(vectorstores): incorrect import for mongodb atlas DriverInfo (#14060)
- **Description:** fix `import` issue for `mongodb atlas` vectore store
integration
  - **Issue:** none
  - **Dependencies:** none

while trying to follow official `langchain`'s [mongodb integration
guide](https://python.langchain.com/docs/integrations/vectorstores/mongodb_atlas),
an import error will happen.

It's caused by incorrect import location:
- `from pymongo import DriverInfo` should be `from pymongo.driver_info
import DriverInfo`
- reference: [pymongo's DriverInfo
class](https://pymongo.readthedocs.io/en/stable/api/pymongo/driver_info.html#pymongo.driver_info.DriverInfo)

Thanks!
2023-12-03 10:22:13 -08:00
James Braza
3833882ab7
Removing extra StdOutCallbackHandler overridden methods (#14136)
Unnecessarily overridden methods:

- Give the idea the subclass is doing something special (when it isn't)
- Block CTRL-click to the actual method

This PR removes some unnecessarily overridden methods in
`StdOutCallbackHandler`

Supercedes https://github.com/langchain-ai/langchain/pull/12858
2023-12-03 09:38:49 -08:00
James Braza
052e23be3e
Added Python logging tracer (#14190)
This PR creates a logging handler and adds a simple unit test of it

Supercedes https://github.com/langchain-ai/langchain/pull/12862

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2023-12-03 09:36:30 -08:00
Bob Lin
62505043be
Closed #14069 (#14166)
### Description

Fix #14069

### Twitter handle

[lin_bob57617](https://twitter.com/lin_bob57617)
2023-12-03 08:55:25 -08:00
Yong woo Song
9938086df0
Fix Html2TextTransformer for shallow copy (#14197)
<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
Hi,
There is some unintended behavior in Html2TextTransformer.
The current code is **directly modifying the original documents that are
passed as arguments to the function.**
Therefore, not only the return of the function but also the input
variables are being modified simultaneously.
**To resolve this, I added unit test code as well.**

reference link: [Shallow vs Deep Copying of Python
Objects](https://realpython.com/copying-python-objects/)

Thanks! ☺️
2023-12-03 08:45:35 -08:00
h3l
818252b1f8
Fix: (issue #14127) Volc Engine MaaS import error (#14194)
- **Description:** fix Volc Engine MaaS import error
- **Issue:** [the issue # it fixes (if
applicable),](https://github.com/langchain-ai/langchain/issues/14127)
  - **Dependencies:** None
  - **Tag maintainer:** @baskaryan 
  - **Twitter handle:**

Co-authored-by: lvzhong <lvzhong@bytedance.com>
2023-12-03 08:43:23 -08:00
Bagatur
0bdb434383
langchain[patch]: Release langchain 0.0.345 (#14184) 2023-12-02 15:53:49 -08:00
Bagatur
15c04a5670
core[patch]: Release 0.0.9 (#14182) 2023-12-02 14:40:56 -08:00
James Braza
bdb6ae2ed3
core[patch]: BaseTracer helper method for Run lookup (#14139)
I observed the same run ID extraction logic is repeated many times in
`BaseTracer`.

This PR creates a helper method for DRY code.
2023-12-02 14:05:50 -08:00
Harutaka Kawamura
41ee3be95f
langchain[patch]: Support passing parameters to llms.Databricks and llms.Mlflow (#14100)
Before, we need to use `params` to pass extra parameters:

```python
from langchain.llms import Databricks

Databricks(..., params={"temperature": 0.0})
```

Now, we can directly specify extra params:

```python
from langchain.llms import Databricks

Databricks(..., temperature=0.0)
```
2023-12-01 19:27:18 -08:00
Abdul
82102c99b3
langchain[patch]: Running SQLDatabaseChain adds prefix "SQLQuery:\n" (#14058)
- **Issue:** https://github.com/langchain-ai/langchain/issues/12077

---------

Co-authored-by: Abdul Kader Maliyakkal <maliyakk@amazon.com>
2023-12-01 19:26:16 -08:00
Samuel Kemp
fd781c89cc
langchain[minor]: add azure ai data document loader (#13404)
This PR adds an "Azure AI data" document loader, which allows Azure AI
users to load their registered data assets as a document object in
langchain.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-01 19:25:55 -08:00
James Braza
24385a00de
core[minor], langchain[patch], experimental[patch]: Added missing py.typed to langchain_core (#14143)
See PR title.

From what I can see, `poetry` will auto-include this. Please let me know
if I am missing something here.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-01 19:15:23 -08:00
quantum00549
f7c257553d
langchain[patch]: fixed a bug that was causing the streaming transfer to not work… (#10827)
… properly

Fixed a bug that was causing the streaming transfer to not work
properly.
 - **Description: 
1、The on_llm_new_token method in the streaming callback can now be
called properly in streaming transfer mode.
2、In streaming transfer mode, LLM can now correctly output the complete
response instead of just the first token.
- **Tag maintainer: @wangxuqi 
- **Twitter handle: @kGX7XJjuYxzX9Km

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-01 18:57:50 -08:00
Eugene Yurtsev
6d0209e0aa
Improve file system blob loader and generic loader (#14004)
* Add support for passing a specific file to the file system blob loader
* Allow specifying a class parameter for the parser for the generic
loader

```python

class AudioLoader(GenericLoader):
  @staticmethod
  def get_parser(**kwargs):
     return MyAudioParser(**kwargs):
```

The intent of the GenericLoader is to provide on-ramps from different
sources (e.g., web, s3, file system).

An alternative is to use pipelining syntax or creating a Pipeline

```
FileSystemBlobLoader(...) | MyAudioParser
```

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-01 21:23:40 -05:00
Lance Martin
cbe4753e1a
Update Open CLIP embd (#14155)
Prior default model required a large amt of RAM and often crashed
Jupyter ntbk kernel.
2023-12-01 15:13:20 -08:00
Amyh102
b6d26d3f9f
infra[patch]: Add unit tests for Huggingface dataset loader (#14053)
- **Description:** Add unit tests for huggingface dataset loader and
sample huggingface dataset for future tests. Updates dependencies for
`datasets` module.
- Adds coverage for [previous pull
request](https://github.com/langchain-ai/langchain/pull/13864)
  - **Tag maintainer:** @hwchase17

---------

Co-authored-by: Amy Han <amyhan@Amys-Air.lan>
Co-authored-by: Amy Han <amyhan@Amys-MacBook-Air.local>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-12-01 12:42:31 -08:00
Govinda Totla
62a3473ac0
docs[patch]: add text_splitter.py test (#14025)
Description: Add HTMLHeaderTextSplitter unit test
Dependencies: none
2023-12-01 11:57:50 -08:00
axiangcoding
1b36ddf16c
docs[patch]: add deprecated note for ErnieChatBot (#14061)
- **Description:** just a little change of ErnieChatBot class
description, sugguesting user to use more suitable class
  - **Issue:** none,
  - **Dependencies:** none,
  - **Tag maintainer:** @baskaryan ,
  - **Twitter handle:** none
2023-12-01 11:16:31 -08:00
Devin Dahoon Kim
32da0a4d71
langchain[patch]: use async_embed_with_retry in _aget_len_safe_embeddings (#14110)
**Description**

`embed_with_retry` is for sync operations and not for async operations.
Use `async_embed_with_retry` for appropriate async operations.


I'm using `OpenAIEmbedding(http_client=httpx.AsyncClient())` with only
async operations.
However, I got an error when I use `embedding.aembed_documents` because
`embed_with_retry` uses sync OpenAI client with async http client.
2023-12-01 10:47:07 -08:00
lijie
371bcb7580
langchain[patch]: set maxsplit when parse python function docstring (#14121)
Description

when the desc of arg in python docstring contains ":", the
`_parse_python_function_docstring` will raise **ValueError: too many
values to unpack (expected 2)**.

A sample desc would be:
"""
Args: 
    error_arg: this is an arg with an additional ":" symbol
"""

So, set `maxsplit` parameter to fix it.
2023-12-01 10:46:53 -08:00
Harrison Chase
ae646701c4
Harrison/ibm (#14133)
Co-authored-by: Mateusz Szewczyk <139469471+MateuszOssGit@users.noreply.github.com>
2023-12-01 12:44:11 -05:00
Eugene Yurtsev
943aa01c14
Improve indexing performance for Postgres (remote database) for refresh for async API (#14132)
This PR speeds up the indexing api on the async path by batching the uid
updates in the sql record manager (which may be remote).
2023-12-01 12:10:07 -05:00
William FH
528fc76d6a
Update Prompt Format Error (#14044)
The number of times I try to format a string (especially in lcel) is
embarrassingly high. Think this may be more actionable than the default
error message. Now I get nice helpful errors


```
KeyError: "Input to ChatPromptTemplate is missing variable 'input'.  Expected: ['input'] Received: ['dialogue']"
```
2023-12-01 09:06:35 -08:00
William FH
71c2e184b4
[Nits] Evaluation - Some Rendering Improvements (#14097)
- Improve rendering of aggregate results at the end
- flatten reference if present
2023-12-01 09:06:07 -08:00
Mark Scannell
9b0e46dcf0
Improve indexing performance for Postgres (remote database) for refresh (#14126)
**Description:** By combining the document timestamp refresh within a
single call to update(), this enables batching of multiple documents in
a single SQL statement. This is important for non-local databases where
tens of milliseconds has a huge impact on performance when doing
document-by-document SQL statements.
**Issue:** #11935 
**Dependencies:** None
**Tag maintainer:** @eyurtsev
2023-12-01 11:36:02 -05:00
Jacob Lee
3328507f11
langchain[patch], experimental[minor]: Adds OllamaFunctions wrapper (#13330)
CC @baskaryan @hwchase17 @jmorganca 

Having a bit of trouble importing `langchain_experimental` from a
notebook, will figure it out tomorrow

~Ah and also is blocked by #13226~

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-30 16:13:57 -08:00
Bagatur
4063bf144a
langchain[patch]: release 0.0.344 (#14095) 2023-11-30 15:57:11 -08:00
Bagatur
efce352d6b
core[patch]: release 0.0.8 (#14086) 2023-11-30 15:12:06 -08:00
Harutaka Kawamura
0d08a692a3
langchain[minor]: Migrate mlflow and databricks classes to deployments APIs. (#13699)
## Description

Related to https://github.com/mlflow/mlflow/pull/10420. MLflow AI
gateway will be deprecated and replaced by the `mlflow.deployments`
module. Happy to split this PR if it's too large.

```
pip install git+https://github.com/langchain-ai/langchain.git@refs/pull/13699/merge#subdirectory=libs/langchain
```

## Dependencies

Install mlflow from https://github.com/mlflow/mlflow/pull/10420:

```
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/10420/merge
```

## Testing plan

The following code works fine on local and databricks:

<details><summary>Click</summary>
<p>

```python
"""
Setup
-----
mlflow deployments start-server --config-path examples/gateway/openai/config.yaml
databricks secrets create-scope <scope>
databricks secrets put-secret <scope> openai-api-key --string-value $OPENAI_API_KEY

Run
---
python /path/to/this/file.py secrets/<scope>/openai-api-key
"""
from langchain.chat_models import ChatMlflow, ChatDatabricks
from langchain.embeddings import MlflowEmbeddings, DatabricksEmbeddings
from langchain.llms import Databricks, Mlflow
from langchain.schema.messages import HumanMessage
from langchain.chains.loading import load_chain
from mlflow.deployments import get_deploy_client
import uuid
import sys
import tempfile
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

###############################
# MLflow
###############################
chat = ChatMlflow(
    target_uri="http://127.0.0.1:5000", endpoint="chat", params={"temperature": 0.1}
)
print(chat([HumanMessage(content="hello")]))

embeddings = MlflowEmbeddings(target_uri="http://127.0.0.1:5000", endpoint="embeddings")
print(embeddings.embed_query("hello")[:3])
print(embeddings.embed_documents(["hello", "world"])[0][:3])

llm = Mlflow(
    target_uri="http://127.0.0.1:5000",
    endpoint="completions",
    params={"temperature": 0.1},
)
print(llm("I am"))

llm_chain = LLMChain(
    llm=llm,
    prompt=PromptTemplate(
        input_variables=["adjective"],
        template="Tell me a {adjective} joke",
    ),
)
print(llm_chain.run(adjective="funny"))

# serialization/deserialization
with tempfile.TemporaryDirectory() as tmpdir:
    print(tmpdir)
    path = f"{tmpdir}/llm.yaml"
    llm_chain.save(path)
    loaded_chain = load_chain(path)
    print(loaded_chain("funny"))

###############################
# Databricks
###############################
secret = sys.argv[1]
client = get_deploy_client("databricks")

# External - chat
name = f"chat-{uuid.uuid4()}"
client.create_endpoint(
    name=name,
    config={
        "served_entities": [
            {
                "name": "test",
                "external_model": {
                    "name": "gpt-4",
                    "provider": "openai",
                    "task": "llm/v1/chat",
                    "openai_config": {
                        "openai_api_key": "{{" + secret + "}}",
                    },
                },
            }
        ],
    },
)
try:
    chat = ChatDatabricks(
        target_uri="databricks", endpoint=name, params={"temperature": 0.1}
    )
    print(chat([HumanMessage(content="hello")]))
finally:
    client.delete_endpoint(endpoint=name)

# External - embeddings
name = f"embeddings-{uuid.uuid4()}"
client.create_endpoint(
    name=name,
    config={
        "served_entities": [
            {
                "name": "test",
                "external_model": {
                    "name": "text-embedding-ada-002",
                    "provider": "openai",
                    "task": "llm/v1/embeddings",
                    "openai_config": {
                        "openai_api_key": "{{" + secret + "}}",
                    },
                },
            }
        ],
    },
)
try:
    embeddings = DatabricksEmbeddings(target_uri="databricks", endpoint=name)
    print(embeddings.embed_query("hello")[:3])
    print(embeddings.embed_documents(["hello", "world"])[0][:3])
finally:
    client.delete_endpoint(endpoint=name)

# External - completions
name = f"completions-{uuid.uuid4()}"
client.create_endpoint(
    name=name,
    config={
        "served_entities": [
            {
                "name": "test",
                "external_model": {
                    "name": "gpt-3.5-turbo-instruct",
                    "provider": "openai",
                    "task": "llm/v1/completions",
                    "openai_config": {
                        "openai_api_key": "{{" + secret + "}}",
                    },
                },
            }
        ],
    },
)
try:
    llm = Databricks(
        endpoint_name=name,
        model_kwargs={"temperature": 0.1},
    )
    print(llm("I am"))
finally:
    client.delete_endpoint(endpoint=name)


# Foundation model - chat
chat = ChatDatabricks(
    endpoint="databricks-llama-2-70b-chat", params={"temperature": 0.1}
)
print(chat([HumanMessage(content="hello")]))

# Foundation model - embeddings
embeddings = DatabricksEmbeddings(endpoint="databricks-bge-large-en")
print(embeddings.embed_query("hello")[:3])

# Foundation model - completions
llm = Databricks(
    endpoint_name="databricks-mpt-7b-instruct", model_kwargs={"temperature": 0.1}
)
print(llm("hello"))
llm_chain = LLMChain(
    llm=llm,
    prompt=PromptTemplate(
        input_variables=["adjective"],
        template="Tell me a {adjective} joke",
    ),
)
print(llm_chain.run(adjective="funny"))

# serialization/deserialization
with tempfile.TemporaryDirectory() as tmpdir:
    print(tmpdir)
    path = f"{tmpdir}/llm.yaml"
    llm_chain.save(path)
    loaded_chain = load_chain(path)
    print(loaded_chain("funny"))

```

Output:

```
content='Hello! How can I assist you today?'
[-0.025058426, -0.01938856, -0.027781019]
[-0.025058426, -0.01938856, -0.027781019]
sorry, but I cannot continue the sentence as it is incomplete. Can you please provide more information or context?
Sure, here's a classic one for you:

Why don't scientists trust atoms?

Because they make up everything!
/var/folders/dz/cd_nvlf14g9g__n3ph0d_0pm0000gp/T/tmpx_4no6ad
{'adjective': 'funny', 'text': "Sure, here's a classic one for you:\n\nWhy don't scientists trust atoms?\n\nBecause they make up everything!"}
content='Hello! How can I assist you today?'
[-0.025058426, -0.01938856, -0.027781019]
[-0.025058426, -0.01938856, -0.027781019]
 a 23 year old female and I am currently studying for my master's degree
content="\nHello! It's nice to meet you. Is there something I can help you with or would you like to chat for a bit?"
[0.051055908203125, 0.007221221923828125, 0.003879547119140625]
[0.051055908203125, 0.007221221923828125, 0.003879547119140625]

hello back
 Well, I don't really know many jokes, but I do know this funny story...
/var/folders/dz/cd_nvlf14g9g__n3ph0d_0pm0000gp/T/tmp7_ds72ex
{'adjective': 'funny', 'text': " Well, I don't really know many jokes, but I do know this funny story..."}
```

</p>
</details>

The existing workflow doesn't break:

<details><summary>click</summary>
<p>

```python
import uuid

import mlflow
from mlflow.models import ModelSignature
from mlflow.types.schema import ColSpec, Schema


class MyModel(mlflow.pyfunc.PythonModel):
    def predict(self, context, model_input):
        return str(uuid.uuid4())


with mlflow.start_run():
    mlflow.pyfunc.log_model(
        "model",
        python_model=MyModel(),
        pip_requirements=["mlflow==2.8.1", "cloudpickle<3"],
        signature=ModelSignature(
            inputs=Schema(
                [
                    ColSpec("string", "prompt"),
                    ColSpec("string", "stop"),
                ]
            ),
            outputs=Schema(
                [
                    ColSpec(name=None, type="string"),
                ]
            ),
        ),
        registered_model_name=f"lang-{uuid.uuid4()}",
    )

# Manually create a serving endpoint with the registered model and run
from langchain.llms import Databricks

llm = Databricks(endpoint_name="<name>")
llm("hello")  # 9d0b2491-3d13-487c-bc02-1287f06ecae7
```

</p>
</details> 

## Follow-up tasks

(This PR is too large. I'll file a separate one for follow-up tasks.)

- Update `docs/docs/integrations/providers/mlflow_ai_gateway.mdx` and
`docs/docs/integrations/providers/databricks.md`.

---------

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-11-30 15:06:58 -08:00
Jeremy Naccache
a14cf87576
core[patch]: Add **kwargs to Langchain's dumps() to allow passing of json.dumps() … (#10628)
…parameters.

In Langchain's `dumps()` function, I've added a `**kwargs` parameter.
This allows users to pass additional parameters to the underlying
`json.dumps()` function, providing greater flexibility and control over
JSON serialization.

Many parameters available in `json.dumps()` can be useful or even
necessary in specific situations. For example, when using an Agent with
return_intermediate_steps set to true, the output is a list of
AgentAction objects. These objects can't be serialized without using
Langchain's `dumps()` function.

The issue arises when using the Agent with a language other than
English, which may contain non-ASCII characters like 'é'. The default
behavior of `json.dumps()` sets ensure_ascii to true, converting
`{"name": "José"}` into `{"name": "Jos\u00e9"}`. This can make the
output hard to read, especially in the case of intermediate steps in
agent logs.

By allowing users to pass additional parameters to `json.dumps()` via
Langchain's dumps(), we can solve this problem. For instance, users can
set `ensure_ascii=False` to maintain the original characters.

This update also enables users to pass other useful `json.dumps()`
parameters like `sort_keys`, providing even more flexibility.

The implementation takes into account edge cases where a user might pass
a "default" parameter, which is already defined by `dumps()`, or an
"indent" parameter, which is also predefined if `pretty=True` is set.

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
2023-11-30 08:52:24 -08:00
Yong woo Song
f4d520ccb5
Fix .env file path in integration_test README.md (#14028)
<!-- Thank you for contributing to LangChain!



Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

### Description
Hello, 

The [integration_test
README](https://github.com/langchain-ai/langchain/tree/master/libs/langchain/tests)
was indicating incorrect paths for the `.env.example` and `.env` files.

`tests/.env.example` ->`tests/integration_tests/.env.example`

While it’s a minor error, it could **potentially lead to confusion** for
the document’s readers, so I’ve made the necessary corrections.

Thank you! ☺️

### Related Issue
- https://github.com/langchain-ai/langchain/pull/2806
2023-11-29 22:14:28 -05:00