langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-16 06:13:16 +00:00

Author	SHA1	Message	Date
David DeCaprio	a4bcb45f65	core:Add optional max_messages to MessagePlaceholder (#16098 ) - Description: Add optional max_messages to MessagePlaceholder - Issue: [16096](https://github.com/langchain-ai/langchain/issues/16096) - Dependencies: None - Twitter handle: @davedecaprio Sometimes it's better to limit the history in the prompt itself rather than the memory. This is needed if you want different prompts in the chain to have different history lengths. --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2024-06-19 23:39:51 +00:00
shaunakgodbole	7193634ae6	fireworks[patch]: fix api_key alias in Fireworks LLM (#23118 ) Thank you for contributing to LangChain! Description The current code snippet for `Fireworks` had incorrect parameters. This PR fixes those parameters. --------- Co-authored-by: Chester Curme <chester.curme@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>	2024-06-19 21:14:42 +00:00
Eugene Yurtsev	1fcf875fe3	core[patch]: Document agent schema (#23194 ) * Document agent schema * Refer folks to langgraph for more information on how to create agents.	2024-06-19 20:16:57 +00:00
Eugene Yurtsev	c2d43544cc	core[patch]: Document messages namespace (#23154 ) - Moved doc-strings below attribtues in TypedDicts -- seems to render better on APIReference pages. * Provided more description and some simple code examples	2024-06-19 15:00:00 -04:00
Eugene Yurtsev	3c917204dc	core[patch]: Add doc-strings to outputs, fix @root_validator (#23190 ) - Document outputs namespace - Update a vanilla @root_validator that was missed	2024-06-19 14:59:06 -04:00
Bagatur	8698cb9b28	infra: add more formatter rules to openai (#23189 ) Turns on https://docs.astral.sh/ruff/settings/#format_docstring-code-format and https://docs.astral.sh/ruff/settings/#format_skip-magic-trailing-comma ```toml [tool.ruff.format] docstring-code-format = true skip-magic-trailing-comma = true ```	2024-06-19 11:39:58 -07:00
Michał Krassowski	710197e18c	community[patch]: restore compatibility with SQLAlchemy 1.x (#22546 ) - Description: Restores compatibility with SQLAlchemy 1.4.x that was broken since #18992 and adds a test run for this version on CI (only for Python 3.11) - Issue: fixes #19681 - Dependencies: None - Twitter handle: `@krassowski_m` --------- Co-authored-by: Erick Friis <erick@langchain.dev>	2024-06-19 17:58:57 +00:00
Erick Friis	48d6ea427f	upstage: move to external repo (#22506 )	2024-06-19 17:56:07 +00:00
Bagatur	0a4ee864e9	openai[patch]: image token counting (#23147 ) Resolves #23000 --------- Co-authored-by: isaac hershenson <ihershenson@hmc.edu> Co-authored-by: ccurme <chester.curme@gmail.com>	2024-06-19 10:41:47 -07:00
Jorge Piedrahita Ortiz	b3e53ffca0	community[patch]: sambanova llm integration improvement (#23137 ) - Description: sambanova sambaverse integration improvement: removed input parsing that was changing raw user input, and was making to use process prompt parameter as true mandatory	2024-06-19 10:30:14 -07:00
Jorge Piedrahita Ortiz	e162893d7f	community[patch]: update sambastudio embeddings (#23133 ) Description: update sambastudio embeddings integration, now compatible with generic endpoints and CoE endpoints	2024-06-19 10:26:56 -07:00
Philippe PRADOS	db6f46c1a6	langchain[small]: Change type to BasePromptTemplate (#23083 ) ```python Change from_llm( prompt: PromptTemplate ... ) ``` to ```python Change from_llm( prompt: BasePromptTemplate ... ) ```	2024-06-19 13:19:36 -04:00
Sergey Kozlov	94452a94b1	core[patch[: add exceptions propagation test for astream_events v2 (#23159 ) Description: `astream_events(version="v2")` didn't propagate exceptions in `langchain-core<=0.2.6`, fixed in the #22916. This PR adds a unit test to check that exceptions are propagated upwards. Co-authored-by: Sergey Kozlov <sergey.kozlov@ludditelabs.io>	2024-06-19 13:00:25 -04:00
Leonid Ganeline	50484be330	prompty: docstring (#23152 ) Added missed docstrings. Format docstrings to the consistent format (used in the API Reference) --------- Co-authored-by: ccurme <chester.curme@gmail.com>	2024-06-19 12:50:58 -04:00
chenxi	505a2e8743	fix: MoonshotChat fails when setting the moonshot_api_key through the OS environment. (#23176 ) Close #23174 Co-authored-by: tianming <tianming@bytenew.com>	2024-06-19 16:28:24 +00:00
Bagatur	677408bfc9	core[patch]: fix chat history circular import (#23182 )	2024-06-19 09:08:36 -07:00
Eugene Yurtsev	883e90d06e	core[patch]: Add an example to the Document schema doc-string (#23131 ) Add an example to the document schema	2024-06-19 11:35:30 -04:00
ccurme	2b08e9e265	core[patch]: update test to catch circular imports (#23172 ) This raises ImportError due to a circular import: ```python from langchain_core import chat_history ``` This does not: ```python from langchain_core import runnables from langchain_core import chat_history ``` Here we update `test_imports` to run each import in a separate subprocess. Open to other ways of doing this!	2024-06-19 15:24:38 +00:00
Eugene Yurtsev	ae4c0ed25a	core[patch]: Add documentation to load namespace (#23143 ) Document some of the modules within the load namespace	2024-06-19 15:21:41 +00:00
Eugene Yurtsev	a34e650f8b	core[patch]: Add doc-string to document compressor (#23085 )	2024-06-19 11:03:49 -04:00
Eugene Yurtsev	1007a715a5	community[patch]: Prevent unit tests from making network requests (#23180 ) * Prevent unit tests from making network requests	2024-06-19 14:56:30 +00:00
ccurme	ca798bc6ea	community: move test to integration tests (#23178 ) Tests failing on master with > FAILED tests/unit_tests/embeddings/test_ovhcloud.py::test_ovhcloud_embed_documents - ValueError: Request failed with status code: 401, {"message":"Bad token; invalid JSON"}	2024-06-19 14:39:48 +00:00
Eugene Yurtsev	4fe8403bfb	core[patch]: Expand documentation in the indexing namespace (#23134 )	2024-06-19 10:11:44 -04:00
Eugene Yurtsev	fe4f10047b	core[patch]: Document embeddings namespace (#23132 ) Document embeddings namespace	2024-06-19 10:11:16 -04:00
Eugene Yurtsev	a3bae56a48	core[patch]: Update documentation in LLM namespace (#23138 ) Update documentation in lllm namespace.	2024-06-19 10:10:50 -04:00
Leonid Ganeline	a70b7a688e	ai21: docstrings (#23142 ) Added missed docstrings. Format docstrings to the consistent format (used in the API Reference)	2024-06-19 08:51:15 -04:00
bilk0h	3d54784e6d	text-splitters: Fix/recursive json splitter data persistence issue (#21529 ) Thank you for contributing to LangChain! Description: Noticed an issue with when I was calling `RecursiveJsonSplitter().split_json()` multiple times that I was getting weird results. I found an issue where `chunks` list in the `_json_split` method. If chunks is not provided when _json_split (which is the case when split_json calls _json_split) then the same list is used for subsequent calls to `_json_split`. You can see this in the test case i also added to this commit. Output should be: ``` [{'a': 1, 'b': 2}] [{'c': 3, 'd': 4}] ``` Instead you get: ``` [{'a': 1, 'b': 2}] [{'a': 1, 'b': 2, 'c': 3, 'd': 4}] ``` --------- Co-authored-by: Nuno Campos <nuno@langchain.dev> Co-authored-by: isaac hershenson <ihershenson@hmc.edu> Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>	2024-06-18 20:21:55 -07:00
鹿鹿鹿鲨	6b46b5e9ce	community: add request_kwargs and expect TimeError AsyncHtmlLoader (#23068 ) - Description: add `request_kwargs` and expect `TimeError` in `_fetch` function for AsyncHtmlLoader. This allows you to fill in the kwargs parameter when using the `load()` method of the `AsyncHtmlLoader` class. Co-authored-by: Yucolu <yucolu@tencent.com>	2024-06-18 20:02:46 -07:00
Leonid Ganeline	109a70fc64	ibm: docstrings (#23149 ) Added missed docstrings. Format docstrings to the consistent format (used in the API Reference)	2024-06-18 20:00:27 -07:00
Ryan Elston	86ee4f0daa	text-splitters: Introduce Experimental Markdown Syntax Splitter (#22257 ) #### Description This MR defines a `ExperimentalMarkdownSyntaxTextSplitter` class. The main goal is to replicate the functionality of the original `MarkdownHeaderTextSplitter` which extracts the header stack as metadata but with one critical difference: it keeps the whitespace of the original text intact. This draft reimplements the `MarkdownHeaderTextSplitter` with a very different algorithmic approach. Instead of marking up each line of the text individually and aggregating them back together into chunks, this method builds each chunk sequentially and applies the metadata to each chunk. This makes the implementation simpler. However, since it's designed to keep white space intact its not a full drop in replacement for the original. Since it is a radical implementation change to the original code and I would like to get feedback to see if this is a worthwhile replacement, should be it's own class, or is not a good idea at all. Note: I implemented the `return_each_line` parameter but I don't think it's a necessary feature. I'd prefer to remove it. This implementation also adds the following additional features: - Splits out code blocks and includes the language in the `"Code"` metadata key - Splits text on the horizontal rule `---` as well - The `headers_to_split_on` parameter is now optional - with sensible defaults that can be overridden. #### Issue Keeping the whitespace keeps the paragraphs structure and the formatting of the code blocks intact which allows the caller much more flexibility in how they want to further split the individuals sections of the resulting documents. This addresses the issues brought up by the community in the following issues: - https://github.com/langchain-ai/langchain/issues/20823 - https://github.com/langchain-ai/langchain/issues/19436 - https://github.com/langchain-ai/langchain/issues/22256 #### Dependencies N/A #### Twitter handle @RyanElston --------- Co-authored-by: isaac hershenson <ihershenson@hmc.edu>	2024-06-18 19:44:00 -07:00
Bagatur	93d0ad97fe	anthropic[patch]: test image input (#23155 )	2024-06-19 02:32:15 +00:00
Leonid Ganeline	3dfd055411	anthropic: docstrings (#23145 ) Added missed docstrings. Format docstrings to the consistent format (used in the API Reference)	2024-06-18 22:26:45 -04:00
Bagatur	90559fde70	openai[patch], standard-tests[patch]: don't pass in falsey stop vals (#23153 ) adds an image input test to standard-tests as well	2024-06-18 18:13:13 -07:00
Bagatur	e8a8286012	core[patch]: runnablewithchathistory from core.runnables (#23136 )	2024-06-19 00:15:18 +00:00
Vadym Barda	b483bf5095	core[minor]: handle boolean data in draw_mermaid (#23135 ) This change should address graph rendering issues for edges with boolean data Example from langgraph: ```python from typing import Annotated, TypedDict from langchain_core.messages import AnyMessage from langgraph.graph import END, START, StateGraph from langgraph.graph.message import add_messages class State(TypedDict): messages: Annotated[list[AnyMessage], add_messages] def branch(state: State) -> bool: return 1 + 1 == 3 graph_builder = StateGraph(State) graph_builder.add_node("foo", lambda state: {"messages": [("ai", "foo")]}) graph_builder.add_node("bar", lambda state: {"messages": [("ai", "bar")]}) graph_builder.add_conditional_edges( START, branch, path_map={True: "foo", False: "bar"}, then=END, ) app = graph_builder.compile() print(app.get_graph().draw_mermaid()) ``` Previous behavior: ```python AttributeError: 'bool' object has no attribute 'split' ``` Current behavior: ```python %%{init: {'flowchart': {'curve': 'linear'}}}%% graph TD; __start__[__start__]:::startclass; __end__[__end__]:::endclass; foo([foo]):::otherclass; bar([bar]):::otherclass; __start__ -. ('a',) .-> foo; foo --> __end__; __start__ -. ('b',) .-> bar; bar --> __end__; classDef startclass fill:#ffdfba; classDef endclass fill:#baffc9; classDef otherclass fill:#fad7de; ```	2024-06-18 20:15:42 +00:00
Bagatur	093ae04d58	core[patch]: Pin pydantic in py3.12.4 (#23130 )	2024-06-18 12:00:02 -07:00
hmasdev	ff0c06b1e5	langchain[patch]: fix `OutputType` of OutputParsers and fix legacy API in OutputParsers (#19792 ) # Description This pull request aims to address specific issues related to the ambiguity and error-proneness of the output types of certain output parsers, as well as the absence of unit tests for some parsers. These issues could potentially lead to runtime errors or unexpected behaviors due to type mismatches when used, causing confusion for developers and users. Through clarifying output types, this PR seeks to improve the stability and reliability. Therefore, this pull request - fixes the `OutputType` of OutputParsers to be the expected type; - e.g. `OutputType` property of `EnumOutputParser` raises `TypeError`. This PR introduce a logic to extract `OutputType` from its attribute. - and fixes the legacy API in OutputParsers like `LLMChain.run` to the modern API like `LLMChain.invoke`; - Note: For `OutputFixingParser`, `RetryOutputParser` and `RetryWithErrorOutputParser`, this PR introduces `legacy` attribute with False as default value in order to keep the backward compatibility - and adds the tests for the `OutputFixingParser` and `RetryOutputParser`. The following table shows my expected output and the actual output of the `OutputType` of OutputParsers. I have used this table to fix `OutputType` of OutputParsers. \| Class Name of OutputParser \| My Expected `OutputType` (after this PR)\| Actual `OutputType` [evidence](#evidence) (before this PR)\| Fix Required \| \|---------\|--------------\|---------\|--------\| \| BooleanOutputParser \| `<class 'bool'>` \| `<class 'bool'>` \| NO \| \| CombiningOutputParser \| `typing.Dict[str, Any]` \| `TypeError` is raised \| YES \| \| DatetimeOutputParser \| `<class 'datetime.datetime'>` \| `<class 'datetime.datetime'>` \| NO \| \| EnumOutputParser(enum=MyEnum) \| `MyEnum` \| `TypeError` is raised \| YES \| \| OutputFixingParser \| The same type as `self.parser.OutputType` \| `~T` \| YES \| \| CommaSeparatedListOutputParser \| `typing.List[str]` \| `typing.List[str]` \| NO \| \| MarkdownListOutputParser \| `typing.List[str]` \| `typing.List[str]` \| NO \| \| NumberedListOutputParser \| `typing.List[str]` \| `typing.List[str]` \| NO \| \| JsonOutputKeyToolsParser \| `typing.Any` \| `typing.Any` \| NO \| \| JsonOutputToolsParser \| `typing.Any` \| `typing.Any` \| NO \| \| PydanticToolsParser \| `typing.Any` \| `typing.Any` \| NO \| \| PandasDataFrameOutputParser \| `typing.Dict[str, Any]` \| `TypeError` is raised \| YES \| \| PydanticOutputParser(pydantic_object=MyModel) \| `<class '__main__.MyModel'>` \| `<class '__main__.MyModel'>` \| NO \| \| RegexParser \| `typing.Dict[str, str]` \| `TypeError` is raised \| YES \| \| RegexDictParser \| `typing.Dict[str, str]` \| `TypeError` is raised \| YES \| \| RetryOutputParser \| The same type as `self.parser.OutputType` \| `~T` \| YES \| \| RetryWithErrorOutputParser \| The same type as `self.parser.OutputType` \| `~T` \| YES \| \| StructuredOutputParser \| `typing.Dict[str, Any]` \| `TypeError` is raised \| YES \| \| YamlOutputParser(pydantic_object=MyModel) \| `MyModel` \| `~T` \| YES \| NOTE: In "Fix Required", "YES" means that it is required to fix in this PR while "NO" means that it is not required. # Issue No issues for this PR. # Twitter handle - [hmdev3](https://twitter.com/hmdev3) # Questions: 1. Is it required to create tests for legacy APIs `LLMChain.run` in the following scripts? - libs/langchain/tests/unit_tests/output_parsers/test_fix.py; - libs/langchain/tests/unit_tests/output_parsers/test_retry.py. 2. Is there a more appropriate expected output type than I expect in the above table? - e.g. the `OutputType` of `CombiningOutputParser` should be SOMETHING... # Actual outputs (before this PR) <div id='evidence'></div> <details><summary>Actual outputs</summary> ## Requirements - Python==3.9.13 - langchain==0.1.13 ```python Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import langchain >>> langchain.__version__ '0.1.13' >>> from langchain import output_parsers ``` ### `BooleanOutputParser` ```python >>> output_parsers.BooleanOutputParser().OutputType <class 'bool'> ``` ### `CombiningOutputParser` ```python >>> output_parsers.CombiningOutputParser(parsers=[output_parsers.DatetimeOutputParser(), output_parsers.CommaSeparatedListOutputParser()]).OutputType Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType raise TypeError( TypeError: Runnable CombiningOutputParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type. ``` ### `DatetimeOutputParser` ```python >>> output_parsers.DatetimeOutputParser().OutputType <class 'datetime.datetime'> ``` ### `EnumOutputParser` ```python >>> from enum import Enum >>> class MyEnum(Enum): ... a = 'a' ... b = 'b' ... >>> output_parsers.EnumOutputParser(enum=MyEnum).OutputType Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType raise TypeError( TypeError: Runnable EnumOutputParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type. ``` ### `OutputFixingParser` ```python >>> output_parsers.OutputFixingParser(parser=output_parsers.DatetimeOutputParser()).OutputType ~T ``` ### `CommaSeparatedListOutputParser` ```python >>> output_parsers.CommaSeparatedListOutputParser().OutputType typing.List[str] ``` ### `MarkdownListOutputParser` ```python >>> output_parsers.MarkdownListOutputParser().OutputType typing.List[str] ``` ### `NumberedListOutputParser` ```python >>> output_parsers.NumberedListOutputParser().OutputType typing.List[str] ``` ### `JsonOutputKeyToolsParser` ```python >>> output_parsers.JsonOutputKeyToolsParser(key_name='tool').OutputType typing.Any ``` ### `JsonOutputToolsParser` ```python >>> output_parsers.JsonOutputToolsParser().OutputType typing.Any ``` ### `PydanticToolsParser` ```python >>> from langchain.pydantic_v1 import BaseModel >>> class MyModel(BaseModel): ... a: int ... >>> output_parsers.PydanticToolsParser(tools=[MyModel, MyModel]).OutputType typing.Any ``` ### `PandasDataFrameOutputParser` ```python >>> output_parsers.PandasDataFrameOutputParser().OutputType Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType raise TypeError( TypeError: Runnable PandasDataFrameOutputParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type. ``` ### `PydanticOutputParser` ```python >>> output_parsers.PydanticOutputParser(pydantic_object=MyModel).OutputType <class '__main__.MyModel'> ``` ### `RegexParser` ```python >>> output_parsers.RegexParser(regex='$', output_keys=['a']).OutputType Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType raise TypeError( TypeError: Runnable RegexParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type. ``` ### `RegexDictParser` ```python >>> output_parsers.RegexDictParser(output_key_to_format={'a':'a'}).OutputType Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType raise TypeError( TypeError: Runnable RegexDictParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type. ``` ### `RetryOutputParser` ```python >>> output_parsers.RetryOutputParser(parser=output_parsers.DatetimeOutputParser()).OutputType ~T ``` ### `RetryWithErrorOutputParser` ```python >>> output_parsers.RetryWithErrorOutputParser(parser=output_parsers.DatetimeOutputParser()).OutputType ~T ``` ### `StructuredOutputParser` ```python >>> from langchain.output_parsers.structured import ResponseSchema >>> response_schemas = [ResponseSchema(name="foo",description="a list of strings",type="List[string]"),ResponseSchema(name="bar",description="a string",type="string"), ] >>> output_parsers.StructuredOutputParser.from_response_schemas(response_schemas).OutputType Traceback (most recent call last): File "<stdin>", line 1, in <module> File "D:\workspace\venv\lib\site-packages\langchain_core\output_parsers\base.py", line 160, in OutputType raise TypeError( TypeError: Runnable StructuredOutputParser doesn't have an inferable OutputType. Override the OutputType property to specify the output type. ``` ### `YamlOutputParser` ```python >>> output_parsers.YamlOutputParser(pydantic_object=MyModel).OutputType ~T ``` <div> --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-06-18 18:59:42 +00:00
Artem Mukhin	e271f75bee	docs: Fix URL formatting in deprecation warnings (#23075 ) Description Updated the URLs in deprecation warning messages. The URLs were previously written as raw strings and are now formatted to be clickable HTML links. Example of a broken link in the current API Reference: https://api.python.langchain.com/en/latest/chains/langchain.chains.openai_functions.extraction.create_extraction_chain_pydantic.html <img width="942" alt="Screenshot 2024-06-18 at 13 21 07" src="https://github.com/langchain-ai/langchain/assets/4854600/a1b1863c-cd03-4af2-a9bc-70375407fb00">	2024-06-18 14:49:58 -04:00
Gabriel Petracca	c6660df58e	community[minor]: Implement Doctran async execution (#22372 ) Description The DoctranTextTranslator has an async transform function that was not implemented because [the doctran library](https://github.com/psychic-api/doctran) uses a sync version of the `execute` method. - I implemented the `DoctranTextTranslator.atransform_documents()` method using `asyncio.to_thread` to run the function in a separate thread. - I updated the example in the Notebook with the new async version. - The performance improvements can be appreciated when a big document is divided into multiple chunks. Relates to: - Issue #14645: https://github.com/langchain-ai/langchain/issues/14645 - Issue #14437: https://github.com/langchain-ai/langchain/issues/14437 - https://github.com/langchain-ai/langchain/pull/15264 --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>	2024-06-18 18:17:37 +00:00
Eugene Yurtsev	aa6415aa7d	core[minor]: Support multiple keys in get_from_dict_or_env (#23086 ) Support passing multiple keys for ge_from_dict_or_env	2024-06-18 14:13:28 -04:00
nold	226802f0c4	community: add args_schema to SearxSearch (#22954 ) This change adds args_schema (pydantic BaseModel) to SearxSearchRun for correct schema formatting on LLM function calls Issue: currently using SearxSearchRun with OpenAI function calling returns the following error "TypeError: SearxSearchRun._run() got an unexpected keyword argument '__arg1' ". This happens because the schema sent to the LLM is "input: '{"__arg1":"foobar"}'" while the method should be called with the "query" parameter. --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>	2024-06-18 17:27:39 +00:00
Bagatur	01783d67fc	core[patch]: Release 0.2.9 (#23091 )	2024-06-18 17:15:04 +00:00
Finlay Macklon	616d06d7fe	community: glob multiple patterns when using DirectoryLoader (#22852 ) - Description: Updated community.langchain_community.document_loaders.directory.py to enable the use of multiple glob patterns in the `DirectoryLoader` class. Now, the glob parameter is of type `list[str] \| str` and still defaults to the same value as before. I updated the docstring of the class to reflect this, and added a unit test to community.tests.unit_tests.document_loaders.test_directory.py named `test_directory_loader_glob_multiple`. This test also shows an example of how to use the new functionality. - ~~Issue:~~Discussion Thread: https://github.com/langchain-ai/langchain/discussions/18559 - Dependencies: None - Twitter handle: N/a - [x] Add tests and docs - Added test (described above) - Updated class docstring - [x] Lint and test --------- Co-authored-by: isaac hershenson <ihershenson@hmc.edu> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> Co-authored-by: Isaac Francisco <78627776+isahers1@users.noreply.github.com>	2024-06-18 09:24:50 -07:00
Eugene Yurtsev	5564d9e404	core[patch]: Document BaseStore (#23082 ) Add doc-string to BaseStore	2024-06-18 11:47:47 -04:00
Takuya Igei	9f791b6ad5	core[patch],community[patch],langchain[patch]: `tenacity` dependency to version `>=8.1.0,<8.4.0` (#22973 ) Fix https://github.com/langchain-ai/langchain/issues/22972. - [x] PR title: "package: description" - Where "package" is whichever of langchain, community, core, experimental, etc. is being modified. Use "docs: ..." for purely docs changes, "templates: ..." for template changes, "infra: ..." for CI changes. - Example: "community: add foobar LLM" - [x] PR message: *Delete this entire checklist* and replace with - Description: a description of the change - Issue: the issue # it fixes, if applicable - Dependencies: any dependencies required for this change - Twitter handle: if your PR gets announced, and you'd like a mention, we'll gladly shout you out! - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [x] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.	2024-06-18 10:34:28 -04:00
Raviraj	858ce264ef	SemanticChunker : Feature Addition ("Semantic Splitting with gradient") (#22895 ) ```SemanticChunker``` currently provide three methods to split the texts semantically: - percentile - standard_deviation - interquartile I propose new method ```gradient```. In this method, the gradient of distance is used to split chunks along with the percentile method (technically) . This method is useful when chunks are highly correlated with each other or specific to a domain e.g. legal or medical. The idea is to apply anomaly detection on gradient array so that the distribution become wider and easy to identify boundaries in highly semantic data. I have tested this merge on a set of 10 domain specific documents (mostly legal). Details : - Issue: Improvement - Dependencies: NA - Twitter handle: [x.com/prajapat_ravi](https://x.com/prajapat_ravi) @hwchase17 --------- Co-authored-by: Raviraj Prajapat <raviraj.prajapat@sirionlabs.com> Co-authored-by: isaac hershenson <ihershenson@hmc.edu>	2024-06-17 21:01:08 -07:00
Raghav Dixit	55705c0f5e	LanceDB integration update (#22869 ) Added : - [x] relevance search (w/wo scores) - [x] maximal marginal search - [x] image ingestion - [x] filtering support - [x] hybrid search w reranking make test, lint_diff and format checked.	2024-06-17 20:54:26 -07:00
Chang Liu	62c8a67f56	community: add KafkaChatMessageHistory (#22216 ) Add chat history store based on Kafka. Files added: `libs/community/langchain_community/chat_message_histories/kafka.py` `docs/docs/integrations/memory/kafka_chat_message_history.ipynb` New issue to be created for future improvement: 1. Async method implementation. 2. Message retrieval based on timestamp. 3. Support for other configs when connecting to cloud hosted Kafka (e.g. add `api_key` field) 4. Improve unit testing & integration testing.	2024-06-17 20:34:01 -07:00
shimajiroxyz	3e835a1aa1	langchain: add id_key option to EnsembleRetriever for metadata-based document merging (#22950 ) Description: - What I changed - By specifying the `id_key` during the initialization of `EnsembleRetriever`, it is now possible to determine which documents to merge scores for based on the value corresponding to the `id_key` element in the metadata, instead of `page_content`. Below is an example of how to use the modified `EnsembleRetriever`: ```python retriever = EnsembleRetriever(retrievers=[ret1, ret2], id_key="id") # The Document returned by each retriever must keep the "id" key in its metadata. ``` - Additionally, I added a script to easily test the behavior of the `invoke` method of the modified `EnsembleRetriever`. - Why I changed - There are cases where you may want to calculate scores by treating Documents with different `page_content` as the same when using `EnsembleRetriever`. For example, when you want to ensemble the search results of the same document described in two different languages. - The previous `EnsembleRetriever` used `page_content` as the basis for score aggregation, making the above usage difficult. Therefore, the score is now calculated based on the specified key value in the Document's metadata. Twitter handle: @shimajiroxyz	2024-06-18 03:29:17 +00:00
mackong	39f6c4169d	langchain[patch]: add tool messages formatter for tool calling agent (#22849 ) - Description: add tool_messages_formatter for tool calling agent, make tool messages can be formatted in different ways for your LLM. - Issue: N/A - Dependencies: N/A	2024-06-17 20:29:00 -07:00

1 2 3 4 5 ...

4614 Commits