Commit Graph

10038 Commits (68e0ae32864ae9251e447d7c580d9330131721e1)
 

Author SHA1 Message Date
Jacob Lee 0c2ebe5f47
docs[patch]: Standardize prerequisites in tutorial docs (#23150)
CC @baskaryan
2 weeks ago
bilk0h 3d54784e6d
text-splitters: Fix/recursive json splitter data persistence issue (#21529)
Thank you for contributing to LangChain!

**Description:** Noticed an issue with when I was calling
`RecursiveJsonSplitter().split_json()` multiple times that I was getting
weird results. I found an issue where `chunks` list in the `_json_split`
method. If chunks is not provided when _json_split (which is the case
when split_json calls _json_split) then the same list is used for
subsequent calls to `_json_split`.


You can see this in the test case i also added to this commit.

Output should be: 
```
[{'a': 1, 'b': 2}]
[{'c': 3, 'd': 4}]
```

Instead you get:
```
[{'a': 1, 'b': 2}]
[{'a': 1, 'b': 2, 'c': 3, 'd': 4}]
```

---------

Co-authored-by: Nuno Campos <nuno@langchain.dev>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
2 weeks ago
Yuki Watanabe 9ab7a6df39
docs: Overhaul Databricks components documentation (#22884)
**Description:** Documentation at
[integrations/llms/databricks](https://python.langchain.com/v0.2/docs/integrations/llms/databricks/)
is not up-to-date and includes examples about chat model and embeddings,
which should be located in the different corresponding subdirectories.
This PR split the page into correct scope and overhaul the contents.

**Note**: This PR might be hard to review on the diffs view, please use
the following preview links for the changed pages.
- `ChatDatabricks`:
https://langchain-git-fork-b-step62-chat-databricks-doc-langchain.vercel.app/v0.2/docs/integrations/chat/databricks/
- `Databricks`:
https://langchain-git-fork-b-step62-chat-databricks-doc-langchain.vercel.app/v0.2/docs/integrations/llms/databricks/
- `DatabricksEmbeddings`:
https://langchain-git-fork-b-step62-chat-databricks-doc-langchain.vercel.app/v0.2/docs/integrations/text_embedding/databricks/

- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Signed-off-by: B-Step62 <yuki.watanabe@databricks.com>
2 weeks ago
鹿鹿鹿鲨 6b46b5e9ce
community: add **request_kwargs and expect TimeError AsyncHtmlLoader (#23068)
- **Description:** add `**request_kwargs` and expect `TimeError` in
`_fetch` function for AsyncHtmlLoader. This allows you to fill in the
kwargs parameter when using the `load()` method of the `AsyncHtmlLoader`
class.

Co-authored-by: Yucolu <yucolu@tencent.com>
2 weeks ago
Leonid Ganeline 109a70fc64
ibm: docstrings (#23149)
Added missed docstrings. Format docstrings to the consistent format
(used in the API Reference)
2 weeks ago
Ryan Elston 86ee4f0daa
text-splitters: Introduce Experimental Markdown Syntax Splitter (#22257)
#### Description
This MR defines a `ExperimentalMarkdownSyntaxTextSplitter` class. The
main goal is to replicate the functionality of the original
`MarkdownHeaderTextSplitter` which extracts the header stack as metadata
but with one critical difference: it keeps the whitespace of the
original text intact.

This draft reimplements the `MarkdownHeaderTextSplitter` with a very
different algorithmic approach. Instead of marking up each line of the
text individually and aggregating them back together into chunks, this
method builds each chunk sequentially and applies the metadata to each
chunk. This makes the implementation simpler. However, since it's
designed to keep white space intact its not a full drop in replacement
for the original. Since it is a radical implementation change to the
original code and I would like to get feedback to see if this is a
worthwhile replacement, should be it's own class, or is not a good idea
at all.

Note: I implemented the `return_each_line` parameter but I don't think
it's a necessary feature. I'd prefer to remove it.

This implementation also adds the following additional features:
- Splits out code blocks and includes the language in the `"Code"`
metadata key
- Splits text on the horizontal rule `---` as well
- The `headers_to_split_on` parameter is now optional - with sensible
defaults that can be overridden.

#### Issue
Keeping the whitespace keeps the paragraphs structure and the formatting
of the code blocks intact which allows the caller much more flexibility
in how they want to further split the individuals sections of the
resulting documents. This addresses the issues brought up by the
community in the following issues:
- https://github.com/langchain-ai/langchain/issues/20823
- https://github.com/langchain-ai/langchain/issues/19436
- https://github.com/langchain-ai/langchain/issues/22256

#### Dependencies
N/A

#### Twitter handle
@RyanElston

---------

Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2 weeks ago
Bagatur 93d0ad97fe
anthropic[patch]: test image input (#23155) 2 weeks ago
Leonid Ganeline 3dfd055411
anthropic: docstrings (#23145)
Added missed docstrings. Format docstrings to the consistent format
(used in the API Reference)
2 weeks ago
Bagatur 90559fde70
openai[patch], standard-tests[patch]: don't pass in falsey stop vals (#23153)
adds an image input test to standard-tests as well
2 weeks ago
Bagatur e8a8286012
core[patch]: runnablewithchathistory from core.runnables (#23136) 2 weeks ago
Jacob Lee 2ae718796e
docs[patch]: Fix typo in feedback (#23146) 2 weeks ago
Jacob Lee 74749c909d
docs[patch]: Adds feedback input after thumbs up/down (#23141)
CC @baskaryan
2 weeks ago
Bagatur cf38981bb7
docs: use trim_messages in chatbot how to (#23139) 2 weeks ago
Vadym Barda b483bf5095
core[minor]: handle boolean data in draw_mermaid (#23135)
This change should address graph rendering issues for edges with boolean
data

Example from langgraph:

```python
from typing import Annotated, TypedDict

from langchain_core.messages import AnyMessage
from langgraph.graph import END, START, StateGraph
from langgraph.graph.message import add_messages


class State(TypedDict):
    messages: Annotated[list[AnyMessage], add_messages]


def branch(state: State) -> bool:
    return 1 + 1 == 3


graph_builder = StateGraph(State)
graph_builder.add_node("foo", lambda state: {"messages": [("ai", "foo")]})
graph_builder.add_node("bar", lambda state: {"messages": [("ai", "bar")]})

graph_builder.add_conditional_edges(
    START,
    branch,
    path_map={True: "foo", False: "bar"},
    then=END,
)

app = graph_builder.compile()
print(app.get_graph().draw_mermaid())
```

Previous behavior:

```python
AttributeError: 'bool' object has no attribute 'split'
```

Current behavior:

```python
%%{init: {'flowchart': {'curve': 'linear'}}}%%
graph TD;
	__start__[__start__]:::startclass;
	__end__[__end__]:::endclass;
	foo([foo]):::otherclass;
	bar([bar]):::otherclass;
	__start__ -. ('a',) .-> foo;
	foo --> __end__;
	__start__ -. ('b',) .-> bar;
	bar --> __end__;
	classDef startclass fill:#ffdfba;
	classDef endclass fill:#baffc9;
	classDef otherclass fill:#fad7de;
```
2 weeks ago
Bagatur 093ae04d58
core[patch]: Pin pydantic in py3.12.4 (#23130) 2 weeks ago
hmasdev ff0c06b1e5
langchain[patch]: fix `OutputType` of OutputParsers and fix legacy API in OutputParsers (#19792)
# Description

This pull request aims to address specific issues related to the
ambiguity and error-proneness of the output types of certain output
parsers, as well as the absence of unit tests for some parsers. These
issues could potentially lead to runtime errors or unexpected behaviors
due to type mismatches when used, causing confusion for developers and
users. Through clarifying output types, this PR seeks to improve the
stability and reliability.

Therefore, this pull request

- fixes the `OutputType` of OutputParsers to be the expected type;
- e.g. `OutputType` property of `EnumOutputParser` raises `TypeError`.
This PR introduce a logic to extract `OutputType` from its attribute.
- and fixes the legacy API in OutputParsers like `LLMChain.run` to the
modern API like `LLMChain.invoke`;
- Note: For `OutputFixingParser`, `RetryOutputParser` and
`RetryWithErrorOutputParser`, this PR introduces `legacy` attribute with
False as default value in order to keep the backward compatibility
- and adds the tests for the `OutputFixingParser` and
`RetryOutputParser`.

The following table shows my expected output and the actual output of
the `OutputType` of OutputParsers.
I have used this table to fix `OutputType` of OutputParsers.

| Class Name of OutputParser | My Expected `OutputType` (after this PR)|
Actual `OutputType` [evidence](#evidence) (before this PR)| Fix Required
|
|---------|--------------|---------|--------|
| BooleanOutputParser | `<class 'bool'>` | `<class 'bool'>` | NO |
| CombiningOutputParser | `typing.Dict[str, Any]` | `TypeError` is
raised | YES |
| DatetimeOutputParser | `<class 'datetime.datetime'>` | `<class
'datetime.datetime'>` | NO |
| EnumOutputParser(enum=MyEnum) | `MyEnum` | `TypeError` is raised | YES
|
| OutputFixingParser | The same type as `self.parser.OutputType` | `~T`
| YES |
| CommaSeparatedListOutputParser | `typing.List[str]` |
`typing.List[str]` | NO |
| MarkdownListOutputParser | `typing.List[str]` | `typing.List[str]` |
NO |
| NumberedListOutputParser | `typing.List[str]` | `typing.List[str]` |
NO |
| JsonOutputKeyToolsParser | `typing.Any` | `typing.Any` | NO |
| JsonOutputToolsParser | `typing.Any` | `typing.Any` | NO |
| PydanticToolsParser | `typing.Any` | `typing.Any` | NO |
| PandasDataFrameOutputParser | `typing.Dict[str, Any]` | `TypeError` is
raised | YES |
| PydanticOutputParser(pydantic_object=MyModel) | `<class
'__main__.MyModel'>` | `<class '__main__.MyModel'>` | NO |
| RegexParser | `typing.Dict[str, str]` | `TypeError` is raised | YES |
| RegexDictParser | `typing.Dict[str, str]` | `TypeError` is raised |
YES |
| RetryOutputParser | The same type as `self.parser.OutputType` | `~T` |
YES |
| RetryWithErrorOutputParser | The same type as `self.parser.OutputType`
| `~T` | YES |
| StructuredOutputParser | `typing.Dict[str, Any]` | `TypeError` is
raised | YES |
| YamlOutputParser(pydantic_object=MyModel) | `MyModel` | `~T` | YES |

NOTE: In "Fix Required", "YES" means that it is required to fix in this
PR while "NO" means that it is not required.

# Issue

No issues for this PR.

# Twitter handle

- [hmdev3](https://twitter.com/hmdev3)

# Questions:

1. Is it required to create tests for legacy APIs `LLMChain.run` in the
following scripts?
   - libs/langchain/tests/unit_tests/output_parsers/test_fix.py;
   - libs/langchain/tests/unit_tests/output_parsers/test_retry.py.

2. Is there a more appropriate expected output type than I expect in the
above table?
- e.g. the `OutputType` of `CombiningOutputParser` should be
SOMETHING...

# Actual outputs (before this PR)

<div id='evidence'></div>

<details><summary>Actual outputs</summary>

## Requirements

- Python==3.9.13
- langchain==0.1.13

```python
Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import langchain
>>> langchain.__version__
'0.1.13'
>>> from langchain import output_parsers
```

### `BooleanOutputParser`

```python
>>> output_parsers.BooleanOutputParser().OutputType
<class 'bool'>
```

### `CombiningOutputParser`

```python
>>> output_parsers.CombiningOutputParser(parsers=[output_parsers.DatetimeOutputParser(), output_parsers.CommaSeparatedListOutputParser()]).OutputType
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType
    raise TypeError(
TypeError: Runnable CombiningOutputParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type.
```

### `DatetimeOutputParser`

```python
>>> output_parsers.DatetimeOutputParser().OutputType
<class 'datetime.datetime'>
```

### `EnumOutputParser`

```python
>>> from enum import Enum
>>> class MyEnum(Enum):
...     a = 'a'
...     b = 'b'
...
>>> output_parsers.EnumOutputParser(enum=MyEnum).OutputType
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType
    raise TypeError(
TypeError: Runnable EnumOutputParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type.
```

### `OutputFixingParser`

```python
>>> output_parsers.OutputFixingParser(parser=output_parsers.DatetimeOutputParser()).OutputType
~T
```

### `CommaSeparatedListOutputParser`

```python
>>> output_parsers.CommaSeparatedListOutputParser().OutputType
typing.List[str]
```

### `MarkdownListOutputParser`

```python
>>> output_parsers.MarkdownListOutputParser().OutputType
typing.List[str]
```

### `NumberedListOutputParser`

```python
>>> output_parsers.NumberedListOutputParser().OutputType
typing.List[str]
```

### `JsonOutputKeyToolsParser`

```python
>>> output_parsers.JsonOutputKeyToolsParser(key_name='tool').OutputType
typing.Any
```

### `JsonOutputToolsParser`

```python
>>> output_parsers.JsonOutputToolsParser().OutputType
typing.Any
```

### `PydanticToolsParser`

```python
>>> from langchain.pydantic_v1 import BaseModel
>>> class MyModel(BaseModel):
...     a: int
...
>>> output_parsers.PydanticToolsParser(tools=[MyModel, MyModel]).OutputType
typing.Any
```

### `PandasDataFrameOutputParser`

```python
>>> output_parsers.PandasDataFrameOutputParser().OutputType
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType
    raise TypeError(
TypeError: Runnable PandasDataFrameOutputParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type.
```

### `PydanticOutputParser`

```python
>>> output_parsers.PydanticOutputParser(pydantic_object=MyModel).OutputType
<class '__main__.MyModel'>
```

### `RegexParser`

```python
>>> output_parsers.RegexParser(regex='$', output_keys=['a']).OutputType
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType
    raise TypeError(
TypeError: Runnable RegexParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type.
```

### `RegexDictParser`

```python
>>> output_parsers.RegexDictParser(output_key_to_format={'a':'a'}).OutputType
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType
    raise TypeError(
TypeError: Runnable RegexDictParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type.
```

### `RetryOutputParser`

```python
>>> output_parsers.RetryOutputParser(parser=output_parsers.DatetimeOutputParser()).OutputType
~T
```

### `RetryWithErrorOutputParser`

```python
>>> output_parsers.RetryWithErrorOutputParser(parser=output_parsers.DatetimeOutputParser()).OutputType
~T
```

### `StructuredOutputParser`

```python
>>> from langchain.output_parsers.structured import ResponseSchema
>>> response_schemas = [ResponseSchema(name="foo",description="a list of strings",type="List[string]"),ResponseSchema(name="bar",description="a string",type="string"), ]
>>> output_parsers.StructuredOutputParser.from_response_schemas(response_schemas).OutputType
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType
    raise TypeError(
TypeError: Runnable StructuredOutputParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type.
```

### `YamlOutputParser`

```python
>>> output_parsers.YamlOutputParser(pydantic_object=MyModel).OutputType
~T
```


<div>

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2 weeks ago
Artem Mukhin e271f75bee
docs: Fix URL formatting in deprecation warnings (#23075)
**Description**

Updated the URLs in deprecation warning messages. The URLs were
previously written as raw strings and are now formatted to be clickable
HTML links.

Example of a broken link in the current API Reference:
https://api.python.langchain.com/en/latest/chains/langchain.chains.openai_functions.extraction.create_extraction_chain_pydantic.html

<img width="942" alt="Screenshot 2024-06-18 at 13 21 07"
src="https://github.com/langchain-ai/langchain/assets/4854600/a1b1863c-cd03-4af2-a9bc-70375407fb00">
2 weeks ago
Gabriel Petracca c6660df58e
community[minor]: Implement Doctran async execution (#22372)
**Description**

The DoctranTextTranslator has an async transform function that was not
implemented because [the doctran
library](https://github.com/psychic-api/doctran) uses a sync version of
the `execute` method.

- I implemented the `DoctranTextTranslator.atransform_documents()`
method using `asyncio.to_thread` to run the function in a separate
thread.
- I updated the example in the Notebook with the new async version.
- The performance improvements can be appreciated when a big document is
divided into multiple chunks.

Relates to:
- Issue #14645: https://github.com/langchain-ai/langchain/issues/14645
- Issue #14437: https://github.com/langchain-ai/langchain/issues/14437
- https://github.com/langchain-ai/langchain/pull/15264

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2 weeks ago
Eugene Yurtsev aa6415aa7d
core[minor]: Support multiple keys in get_from_dict_or_env (#23086)
Support passing multiple keys for ge_from_dict_or_env
2 weeks ago
nold 226802f0c4
community: add args_schema to SearxSearch (#22954)
This change adds args_schema (pydantic BaseModel) to SearxSearchRun for
correct schema formatting on LLM function calls

Issue: currently using SearxSearchRun with OpenAI function calling
returns the following error "TypeError: SearxSearchRun._run() got an
unexpected keyword argument '__arg1' ".

This happens because the schema sent to the LLM is "input:
'{"__arg1":"foobar"}'" while the method should be called with the
"query" parameter.

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2 weeks ago
Bagatur 01783d67fc
core[patch]: Release 0.2.9 (#23091) 2 weeks ago
Finlay Macklon 616d06d7fe
community: glob multiple patterns when using DirectoryLoader (#22852)
- **Description:** Updated
*community.langchain_community.document_loaders.directory.py* to enable
the use of multiple glob patterns in the `DirectoryLoader` class. Now,
the glob parameter is of type `list[str] | str` and still defaults to
the same value as before. I updated the docstring of the class to
reflect this, and added a unit test to
*community.tests.unit_tests.document_loaders.test_directory.py* named
`test_directory_loader_glob_multiple`. This test also shows an example
of how to use the new functionality.
- ~~Issue:~~**Discussion Thread:**
https://github.com/langchain-ai/langchain/discussions/18559
- **Dependencies:** None
- **Twitter handle:** N/a

- [x] **Add tests and docs**
    - Added test (described above)
    - Updated class docstring

- [x] **Lint and test**

---------

Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>
2 weeks ago
Eugene Yurtsev 5564d9e404
core[patch]: Document BaseStore (#23082)
Add doc-string to BaseStore
2 weeks ago
Takuya Igei 9f791b6ad5
core[patch],community[patch],langchain[patch]: `tenacity` dependency to version `>=8.1.0,<8.4.0` (#22973)
Fix https://github.com/langchain-ai/langchain/issues/22972.

- [x] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [x] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2 weeks ago
Raghav Dixit 74c4cbb859
LanceDB example minor change (#23069)
Removed package version `0.6.13` in the example.
2 weeks ago
Bagatur ddfbca38df
docs: add trim_messages to chatbot (#23061) 2 weeks ago
Lance Martin 931b41b30f
Update Fireworks link (#23058) 2 weeks ago
Leonid Ganeline 6a66d8e2ca
docs: `AWS` platform page update (#23063)
Added a reference to the `GlueCatalogLoader` new document loader.
2 weeks ago
Raviraj 858ce264ef
SemanticChunker : Feature Addition ("Semantic Splitting with gradient") (#22895)
```SemanticChunker``` currently provide three methods to split the texts semantically:
- percentile
- standard_deviation
- interquartile

I propose new method ```gradient```. In this method, the gradient of distance is used to split chunks along with the percentile method (technically) . This method is useful when chunks are highly correlated with each other or specific to a domain e.g. legal or medical. The idea is to apply anomaly detection on gradient array so that the distribution become wider and easy to identify boundaries in highly semantic data.
I have tested this merge on a set of 10 domain specific documents (mostly legal).

Details : 
    - **Issue:** Improvement
    - **Dependencies:** NA
    - **Twitter handle:** [x.com/prajapat_ravi](https://x.com/prajapat_ravi)


@hwchase17

---------

Co-authored-by: Raviraj Prajapat <raviraj.prajapat@sirionlabs.com>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2 weeks ago
Raghav Dixit 55705c0f5e
LanceDB integration update (#22869)
Added : 

- [x] relevance search (w/wo scores)
- [x] maximal marginal search
- [x] image ingestion
- [x] filtering support
- [x] hybrid search w reranking 

make test, lint_diff and format checked.
2 weeks ago
Chang Liu 62c8a67f56
community: add KafkaChatMessageHistory (#22216)
Add chat history store based on Kafka.

Files added: 
`libs/community/langchain_community/chat_message_histories/kafka.py`
`docs/docs/integrations/memory/kafka_chat_message_history.ipynb`

New issue to be created for future improvement:
1. Async method implementation.
2. Message retrieval based on timestamp.
3. Support for other configs when connecting to cloud hosted Kafka (e.g.
add `api_key` field)
4. Improve unit testing & integration testing.
2 weeks ago
shimajiroxyz 3e835a1aa1
langchain: add id_key option to EnsembleRetriever for metadata-based document merging (#22950)
**Description:**
- What I changed
- By specifying the `id_key` during the initialization of
`EnsembleRetriever`, it is now possible to determine which documents to
merge scores for based on the value corresponding to the `id_key`
element in the metadata, instead of `page_content`. Below is an example
of how to use the modified `EnsembleRetriever`:
    ```python
retriever = EnsembleRetriever(retrievers=[ret1, ret2], id_key="id") #
The Document returned by each retriever must keep the "id" key in its
metadata.
    ```

- Additionally, I added a script to easily test the behavior of the
`invoke` method of the modified `EnsembleRetriever`.

- Why I changed
- There are cases where you may want to calculate scores by treating
Documents with different `page_content` as the same when using
`EnsembleRetriever`. For example, when you want to ensemble the search
results of the same document described in two different languages.
- The previous `EnsembleRetriever` used `page_content` as the basis for
score aggregation, making the above usage difficult. Therefore, the
score is now calculated based on the specified key value in the
Document's metadata.

**Twitter handle:** @shimajiroxyz
2 weeks ago
mackong 39f6c4169d
langchain[patch]: add tool messages formatter for tool calling agent (#22849)
- **Description:** add tool_messages_formatter for tool calling agent,
make tool messages can be formatted in different ways for your LLM.
  - **Issue:** N/A
  - **Dependencies:** N/A
2 weeks ago
Lucas Tucker e25a5966b5
docs: Standardize DocumentLoader docstrings (#22932)
**Standardizing DocumentLoader docstrings (of which there are many)**

This PR addresses issue #22866 and adds docstrings according to the
issue's specified format (in the appendix) for files csv_loader.py and
json_loader.py in langchain_community.document_loaders. In particular,
the following sections have been added to both CSVLoader and JSONLoader:
Setup, Instantiate, Load, Async load, and Lazy load. It may be worth
adding a 'Metadata' section to the JSONLoader docstring to clarify how
we want to extract the JSON metadata (using the `metadata_func`
argument). The files I used to walkthrough the various sections were
`example_2.json` from
[HERE](https://support.oneskyapp.com/hc/en-us/articles/208047697-JSON-sample-files)
and `hw_200.csv` from
[HERE](https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html).

---------

Co-authored-by: lucast2021 <lucast2021@headroyce.org>
Co-authored-by: isaac hershenson <ihershenson@hmc.edu>
2 weeks ago
Leonid Ganeline a56ff199a7
docs: embeddings classes (#22927)
Added a table with all Embedding classes.
2 weeks ago
Mohammad Mohtashim 60ba02f5db
[Community]: Fixed DDG DuckDuckGoSearchResults Docstring (#22968)
- **Description:** A very small fix in the Docstring of
`DuckDuckGoSearchResults` identified in the following issue.
- **Issue:** #22961

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2 weeks ago
Eun Hye Kim 70761af8cf
community: Fix #22975 (Add SSL Verification Option to Requests Class in langchain_community) (#22977)
- **PR title**: "community: Fix #22975 (Add SSL Verification Option to
Requests Class in langchain_community)"
- **PR message**: 
    - **Description:**
- Added an optional verify parameter to the Requests class with a
default value of True.
- Modified the get, post, patch, put, and delete methods to include the
verify parameter.
- Updated the _arequest async context manager to include the verify
parameter.
- Added the verify parameter to the GenericRequestsWrapper class and
passed it to the Requests class.
    - **Issue:** This PR fixes issue #22975.
- **Dependencies:** No additional dependencies are required for this
change.
    - **Twitter handle:** @lunara_x

You can check this change with below code.
```python
from langchain_openai.chat_models import ChatOpenAI
from langchain.requests import RequestsWrapper
from langchain_community.agent_toolkits.openapi import planner
from langchain_community.agent_toolkits.openapi.spec import reduce_openapi_spec

with open("swagger.yaml") as f:
    data = yaml.load(f, Loader=yaml.FullLoader)
swagger_api_spec = reduce_openapi_spec(data)

llm = ChatOpenAI(model='gpt-4o')
swagger_requests_wrapper = RequestsWrapper(verify=False) # modified point
superset_agent = planner.create_openapi_agent(swagger_api_spec, swagger_requests_wrapper, llm, allow_dangerous_requests=True, handle_parsing_errors=True)

superset_agent.run(
    "Tell me the number and types of charts and dashboards available."
)
```

---------

Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
2 weeks ago
Mohammad Mohtashim bf839676c7
[Community]: FIxed the DocumentDBVectorSearch `_similarity_search_without_score` (#22970)
- **Description:** The PR #22777 introduced a bug in
`_similarity_search_without_score` which was raising the
`OperationFailure` error. The mistake was syntax error for MongoDB
pipeline which has been corrected now.
    - **Issue:** #22770
2 weeks ago
Nuno Campos f01f12ce1e
Include "no escape" and "inverted section" mustache vars in Prompt.input_variables and Prompt.input_schema (#22981) 2 weeks ago
Bella Be 7a0b36501f
docs: Update how to docs for pydantic compatibility (#22983)
Add missing imports in docs from langchain_core.tools  BaseTool

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2 weeks ago
Jacob Lee 3b7b276f6f
docs[patch]: Adds evaluation sections (#23050)
Also want to add an index/rollup page to LangSmith docs to enable
linking to a how-to category as a group (e.g.
https://docs.smith.langchain.com/how_to_guides/evaluation/)

CC @agola11 @hinthornw
2 weeks ago
Jacob Lee 6605ae22f6
docs[patch]: Update docs links (#23013) 2 weeks ago
Bagatur c2b2e3266c
core[minor]: message transformer utils (#22752) 2 weeks ago
Qingchuan Hao c5e0acf6f0
docs: add bing search integration to agent (#22929)
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
2 weeks ago
Anders Swanson aacc6198b9
community: OCI GenAI embedding batch size (#22986)
Thank you for contributing to LangChain!

- [x] **PR title**: "community: OCI GenAI embedding batch size"



- [x] **PR message**:
    - **Issue:** #22985 


- [ ] **Add tests and docs**: N/A


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.

---------

Signed-off-by: Anders Swanson <anders.swanson@oracle.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2 weeks ago
Bagatur 8235bae48e
core[patch]: Release 0.2.8 (#23012) 2 weeks ago
Bagatur 5ee6e22983
infra: test all dependents on any change (#22994) 2 weeks ago
Nuno Campos bd4b68cd54
core: run_in_executor: Wrap StopIteration in RuntimeError (#22997)
- StopIteration can't be set on an asyncio.Future it raises a TypeError
and leaves the Future pending forever so we need to convert it to a
RuntimeError
2 weeks ago
Bagatur d96f67b06f
standard-tests[patch]: Update chat model standard tests (#22378)
- Refactor standard test classes to make them easier to configure
- Update openai to support stop_sequences init param
- Update groq to support stop_sequences init param
- Update fireworks to support max_retries init param
- Update ChatModel.bind_tools to type tool_choice
- Update groq to handle tool_choice="any". **this may be controversial**

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2 weeks ago
Bob Lin 14f0cdad58
docs: Add some 3rd party tutorials (#22931)
Langchain is very popular among developers in China, but there are still
no good Chinese books or documents, so I want to add my own Chinese
resources on langchain topics, hoping to give Chinese readers a better
experience using langchain. This is not a translation of the official
langchain documentation, but my understanding.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2 weeks ago