**Description:** Circular dependencies when parsing references leading
to `RecursionError: maximum recursion depth exceeded` issue. This PR
address the issue by handling previously seen refs as in any typical DFS
to avoid infinite depths.
**Issue:** https://github.com/langchain-ai/langchain/issues/12163
**Twitter handle:** https://twitter.com/theBhulawat
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
**Description:** Refactor code of FAISS vectorcstore and update the
related documentation.
Details:
- replace `.format()` with f-strings for strings formatting;
- refactor definition of a filtering function to make code more readable
and more flexible;
- slightly improve efficiency of
`max_marginal_relevance_search_with_score_by_vector` method by removing
unnecessary looping over the same elements;
- slightly improve efficiency of `delete` method by using set data
structure for checking if the element was already deleted;
**Issue:** fix small inconsistency in the documentation (the old example
was incorrect and unappliable to faiss vectorstore)
**Dependencies:** basic langchain-community dependencies and `faiss`
(for CPU or for GPU)
**Twitter handle:** antonenkodev
Issue : _call method of LLMRouterChain uses predict_and_parse, which is
slated for deprecation.
Description : Instead of using predict_and_parse, this replaces it with
individual predict and parse functions.
**Description:** Update the docstring of OpenAI, OpenAIEmbeddings and
ChatOpenAI classes
**Issue:** Update import module paths to the current LangChain API
**Dependencies:** None
**Lint and test**: `make format` and `make lint` were run
This incorporates the review comments from langchain-ai/langchain#18637
which I closed due to an issue I had in updating that pr branch
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:** Update AnthropicLLM deprecation message import path for
ChatAnthropic
**Issue:** Incorrect import path in deprecation message
**Dependencies:** None
**Lint and test**: `make format`, `make lint` and `make test` were run
This PR updates the on_tool_end handlers to return the raw output from the tool instead of casting it to a string.
This is technically a breaking change, though it's impact is expected to be somewhat minimal. It will fix behavior in `astream_events` as well.
Fixes the following issue #18760 raised by @eyurtsev
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
BasePDFLoader doesn't parse the suffix of the file correctly when
parsing S3 presigned urls. This fix enables the proper detection and
parsing of S3 presigned URLs to prevent errors such as `OSError: [Errno
36] File name too long`.
No additional dependencies required.
Deduplicate documents using MD5 of the page_content. Also allows for
custom deduplication with graph ingestion method by providing metadata
id attribute
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
- **Description:** Adding an optional parameter `linearization_config`
to the `AmazonTextractPDFLoader` so the caller can define how the output
will be linearized, instead of forcing a predefined set of linearization
configs. It will still have a default configuration as this will be an
optional parameter.
- **Issue:** #17457
- **Dependencies:** The same ones that already exist for
`AmazonTextractPDFLoader`
- **Twitter handle:** [@lvieirajr19](https://twitter.com/lvieirajr19)
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
*Description**: My previous
[PR](https://github.com/langchain-ai/langchain/pull/18521) was
mistakenly closed, so I am reopening this one. Context: AWS released two
Mistral models on Bedrock last Friday (March 1, 2024). This PR includes
some code adjustments to ensure their compatibility with the Bedrock
class.
---------
Co-authored-by: Anis ZAKARI <anis.zakari@hymaia.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
- **Description:** Update azuresearch vectorstore from_texts() method to
include fields argument, necessary for creating an Azure AI Search index
with custom fields.
- **Issue:** Currently index fields are fixed to default fields if Azure
Search index is created using from_texts() method
- **Dependencies:** None
- **Twitter handle:** None
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Small improvement to the openapi prompt.
The agent was not finding the server base URL (looping through all
nodes). This small change narrows the search and enables finding the url
faster.
No dependency
Twitter : @al1pra
# Proper example for AzureOpenAI usage in error message
The original error message is wrong in part of a usage example it gives.
Corrected to the right one.
Co-authored-by: Dzmitry Kankalovich <dzmitry_kankalovich@epam.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** `S3DirectoryLoader` is failing if prefix is a folder
(ex: `my_folder/`) because `S3FileLoader` will try to load that folder
and will fail. This PR skip nested directories so prefix can be set to
folder instead of `my_folder/files_prefix`.
- **Issue:**
- #11917
- #6535
- #4326
- **Dependencies:** none
- **Twitter handle:** @Falydoor
- [x] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
- [ ] Title: Mongodb: MongoDB connection performance improvement.
- [ ] Message:
- **Description:** I made collection index_creation as optional. Index
Creation is one time process.
- **Issue:** MongoDBChatMessageHistory class object is attempting to
create an index during connection, causing each request to take longer
than usual. This should be optional with a parameter.
- **Dependencies:** N/A
- **Branch to be checked:** origin/mongo_index_creation
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
- **Description:** Add embedding instruction to
HuggingFaceBgeEmbeddings, so that it can be compatible with nomic and
other models that need embedding instruction.
---------
Co-authored-by: Tao Wu <tao.wu@rwth-aachen.de>
Co-authored-by: Bagatur <baskaryan@gmail.com>
_generate() and _agenerate() both accept **kwargs, then pass them on to
_format_output; but _format_output doesn't accept **kwargs. Attempting
to pass, e.g.,
timeout=50
to _generate (or invoke()) results in a TypeError.
Thank you for contributing to LangChain!
- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
- Example: "community: add foobar LLM"
- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
- **Description:** a description of the change
- **Issue:** the issue # it fixes, if applicable
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!
- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
## Add Passio Nutrition AI Food Search Tool to Community Package
### Description
We propose adding a new tool to the `community` package, enabling
integration with Passio Nutrition AI for food search functionality. This
tool will provide a simple interface for retrieving nutrition facts
through the Passio Nutrition AI API, simplifying user access to
nutrition data based on food search queries.
### Implementation Details
- **Class Structure:** Implement `NutritionAI`, extending `BaseTool`. It
includes an `_run` method that accepts a query string and, optionally, a
`CallbackManagerForToolRun`.
- **API Integration:** Use `NutritionAIAPI` for the API wrapper,
encapsulating all interactions with the Passio Nutrition AI and
providing a clean API interface.
- **Error Handling:** Implement comprehensive error handling for API
request failures.
### Expected Outcome
- **User Benefits:** Enable easy querying of nutrition facts from Passio
Nutrition AI, enhancing the utility of the `langchain_community` package
for nutrition-related projects.
- **Functionality:** Provide a straightforward method for integrating
nutrition information retrieval into users' applications.
### Dependencies
- `langchain_core` for base tooling support
- `pydantic` for data validation and settings management
- Consider `requests` or another HTTP client library if not covered by
`NutritionAIAPI`.
### Tests and Documentation
- **Unit Tests:** Include tests that mock network interactions to ensure
tool reliability without external API dependency.
- **Documentation:** Create an example notebook in
`docs/docs/integrations/tools/passio_nutrition_ai.ipynb` showing usage,
setup, and example queries.
### Contribution Guidelines Compliance
- Adhere to the project's linting and formatting standards (`make
format`, `make lint`, `make test`).
- Ensure compliance with LangChain's contribution guidelines,
particularly around dependency management and package modifications.
### Additional Notes
- Aim for the tool to be a lightweight, focused addition, not
introducing significant new dependencies or complexity.
- Potential future enhancements could include caching for common queries
to improve performance.
### Twitter Handle
- Here is our Passio AI [twitter handle](https://twitter.com/@passio_ai)
where we announce our products.
If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.
"community: added a feature to filter documents in Mongoloader"
- **Description:** added a feature to filter documents in Mongoloader
- **Feature:** the feature #18251
- **Dependencies:** No
- **Twitter handle:** https://twitter.com/im_Kushagra
For some DBs with lots of tables, reflection of all the tables can take
very long. So this change will make the tables be reflected lazily when
get_table_info() is called and `lazy_table_reflection` is True.
Allows all chat models that implement _stream, but not _astream to still have async streaming to work.
Amongst other things this should resolve issues with streaming community model implementations through langserve since langserve is exclusively async.
**Description:** Replacing the deprecated predict() and apredict()
methods in the unit tests
**Issue:** Not applicable
**Dependencies:** None
**Lint and test**: `make format`, `make lint` and `make test` have been
run
Make `ElasticsearchRetriever` available as top-level import.
The `langchain` package depends on `langchain-community` so we do not
need to depend on it explicitly.
## Description
- Add [Friendli](https://friendli.ai/) integration for `Friendli` LLM
and `ChatFriendli` chat model.
- Unit tests and integration tests corresponding to this change are
added.
- Documentations corresponding to this change are added.
## Dependencies
- Optional dependency
[`friendli-client`](https://pypi.org/project/friendli-client/) package
is added only for those who use `Frienldi` or `ChatFriendli` model.
## Twitter handle
- https://twitter.com/friendliai
This pull request introduces initial support for the TiDB vector store.
The current version is basic, laying the foundation for the vector store
integration. While this implementation provides the essential features,
we plan to expand and improve the TiDB vector store support with
additional enhancements in future updates.
Upcoming Enhancements:
* Support for Vector Index Creation: To enhance the efficiency and
performance of the vector store.
* Support for max marginal relevance search.
* Customized Table Structure Support: Recognizing the need for
flexibility, we plan for more tailored and efficient data store
solutions.
Simple use case exmaple
```python
from typing import List, Tuple
from langchain.docstore.document import Document
from langchain_community.vectorstores import TiDBVectorStore
from langchain_openai import OpenAIEmbeddings
db = TiDBVectorStore.from_texts(
embedding=embeddings,
texts=['Andrew like eating oranges', 'Alexandra is from England', 'Ketanji Brown Jackson is a judge'],
table_name="tidb_vector_langchain",
connection_string=tidb_connection_url,
distance_strategy="cosine",
)
query = "Can you tell me about Alexandra?"
docs_with_score: List[Tuple[Document, float]] = db.similarity_search_with_score(query)
for doc, score in docs_with_score:
print("-" * 80)
print("Score: ", score)
print(doc.page_content)
print("-" * 80)
```
## **Description:**
MongoDB integration tests link to a provided Atlas Cluster. We have very
stringent permissions set against the cluster provided. In order to make
it easier to track and isolate the collections each test gets run
against, we've updated the collection names to map the test file name.
i.e. `langchain_{filename}` => `langchain_test_vectorstores`
Fixes integration test results
![image](https://github.com/langchain-ai/langchain/assets/2887713/41f911b9-55f7-4fe4-9134-5514b82009f9)
## **Dependencies:**
Provided MONGODB_ATLAS_URI
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
cc: @shaneharvey, @blink1073 , @NoahStapp , @caseyclements
- **Description:** Chroma use uuid4 instead of uuid1 as random ids. Use
uuid1 may leak mac address, changing to uuid4 will not cause other
effects.
- **Issue:** None
- **Dependencies:** None
- **Twitter handle:** None
Fixes#18513.
## Description
This PR attempts to fix the support for Anthropic Claude v3 models in
BedrockChat LLM. The changes here has updated the payload to use the
`messages` format instead of the formatted text prompt for all models;
`messages` API is backwards compatible with all models in Anthropic, so
this should not break the experience for any models.
## Notes
The PR in the current form does not support the v3 models for the
non-chat Bedrock LLM. This means, that with these changes, users won't
be able to able to use the v3 models with the Bedrock LLM. I can open a
separate PR to tackle this use-case, the intent here was to get this out
quickly, so users can start using and test the chat LLM. The Bedrock LLM
classes have also grown complex with a lot of conditions to support
various providers and models, and is ripe for a refactor to make future
changes more palatable. This refactor is likely to take longer, and
requires more thorough testing from the community. Credit to PRs
[18579](https://github.com/langchain-ai/langchain/pull/18579) and
[18548](https://github.com/langchain-ai/langchain/pull/18548) for some
of the code here.
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
**Description:**
This integrates Infinispan as a vectorstore.
Infinispan is an open-source key-value data grid, it can work as single
node as well as distributed.
Vector search is supported since release 15.x
For more: [Infinispan Home](https://infinispan.org)
Integration tests are provided as well as a demo notebook
Follow up on https://github.com/langchain-ai/langchain/pull/17467.
- Update all references to the Elasticsearch classes to use the partners
package.
- Deprecate community classes.
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
ValidationError: 2 validation errors for DocArrayDoc
text
Field required [type=missing, input_value={'embedding': [-0.0191128...9, 0.01005221541175212]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.5/v/missing
metadata
Field required [type=missing, input_value={'embedding': [-0.0191128...9, 0.01005221541175212]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.5/v/missing
```
In the `_get_doc_cls` method, the `DocArrayDoc` class is defined as
follows:
```python
class DocArrayDoc(BaseDoc):
text: Optional[str]
embedding: Optional[NdArray] = Field(**embeddings_params)
metadata: Optional[dict]
```
This is a PR that adds a dangerous load parameter to force users to opt in to use pickle.
This is a PR that's meant to raise user awareness that the pickling module is involved.
This is a patch for `CVE-2024-2057`:
https://www.cve.org/CVERecord?id=CVE-2024-2057
This affects users that:
* Use the `TFIDFRetriever`
* Attempt to de-serialize it from an untrusted source that contains a
malicious payload
- **Description:** Databricks SerDe uses cloudpickle instead of pickle
when serializing a user-defined function transform_input_fn since pickle
does not support functions defined in `__main__`, and cloudpickle
supports this.
- **Dependencies:** cloudpickle>=2.0.0
Added a unit test.
Description:
This pull request addresses two key improvements to the langchain
repository:
**Fix for Crash in Flight Search Interface**:
Previously, the code would crash when encountering a failure scenario in
the flight ticket search interface. This PR resolves this issue by
implementing a fix to handle such scenarios gracefully. Now, the code
handles failures in the flight search interface without crashing,
ensuring smoother operation.
**Documentation Update for Amadeus Toolkit**:
Prior to this update, examples provided in the documentation for the
Amadeus Toolkit were unable to run correctly due to outdated
information. This PR includes an update to the documentation, ensuring
that all examples can now be executed successfully. With this update,
users can effectively utilize the Amadeus Toolkit with accurate and
functioning examples.
These changes aim to enhance the reliability and usability of the
langchain repository by addressing issues related to error handling and
ensuring that documentation remains up-to-date and actionable.
Issue: https://github.com/langchain-ai/langchain/issues/17375
Twitter Handle: SingletonYxx
### Description
Changed the value specified for `content_key` in JSONLoader from a
single key to a value based on jq schema.
I created [similar
PR](https://github.com/langchain-ai/langchain/pull/11255) before, but it
has several conflicts because of the architectural change associated
stable version release, so I re-create this PR to fit new architecture.
### Why
For json data like the following, specify `.data[].attributes.message`
for page_content and `.data[].attributes.id` or
`.data[].attributes.attributes. tags`, etc., the `content_key` must also
parse the json structure.
<details>
<summary>sample json data</summary>
```json
{
"data": [
{
"attributes": {
"message": "message1",
"tags": [
"tag1"
]
},
"id": "1"
},
{
"attributes": {
"message": "message2",
"tags": [
"tag2"
]
},
"id": "2"
}
]
}
```
</details>
<details>
<summary>sample code</summary>
```python
def metadata_func(record: dict, metadata: dict) -> dict:
metadata["source"] = None
metadata["id"] = record.get("id")
metadata["tags"] = record["attributes"].get("tags")
return metadata
sample_file = "sample1.json"
loader = JSONLoader(
file_path=sample_file,
jq_schema=".data[]",
content_key=".attributes.message", ## content_key is parsable into jq schema
is_content_key_jq_parsable=True, ## this is added parameter
metadata_func=metadata_func
)
data = loader.load()
data
```
</details>
### Dependencies
none
### Twitter handle
[kzk_maeda](https://twitter.com/kzk_maeda)
Neo4j tools use particular node labels and relationship types to store
metadata, but are irrelevant for text2cypher or graph generation, so we
want to ignore them in the schema representation.
Deprecates the old langchain-hub repository. Does *not* deprecate the
new https://smith.langchain.com/hub
@PinkDraconian has correctly raised that in the event someone is loading
unsanitized user input into the `try_load_from_hub` function, they have
the ability to load files from other locations in github than the
hwchase17/langchain-hub repository.
This PR adds some more path checking to that function and deprecates the
functionality in favor of the hub built into LangSmith.
## Description
Adding in Unit Test variation for `MongoDBChatMessageHistory` package
Follow-up to #18590
- [x] **Add tests and docs**: Unit test is what's being added
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
## **Description**
Migrate the `MongoDBChatMessageHistory` to the managed
`langchain-mongodb` partner-package
## **Dependencies**
None
## **Twitter handle**
@mongodb
## **tests and docs**
- [x] Migrate existing integration test
- [x ]~ Convert existing integration test to a unit test~ Creation is
out of scope for this ticket
- [x ] ~Considering delaying work until #17470 merges to leverage the
`MockCollection` object. ~
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/
---------
Co-authored-by: Erick Friis <erick@langchain.dev>
# Description
- **Description:** Adding MongoDB LLM Caching Layer abstraction
- **Issue:** N/A
- **Dependencies:** None
- **Twitter handle:** @mongodb
Checklist:
- [x] PR title: Please title your PR "package: description", where
"package" is whichever of langchain, community, core, experimental, etc.
is being modified. Use "docs: ..." for purely docs changes, "templates:
..." for template changes, "infra: ..." for CI changes.
- Example: "community: add foobar LLM"
- [x] PR Message (above)
- [x] Pass lint and test: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified to check that you're
passing lint and testing. See contribution guidelines for more
information on how to write/run tests, lint, etc:
https://python.langchain.com/docs/contributing/
- [ ] Add tests and docs: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @efriis, @eyurtsev, @hwchase17.
---------
Co-authored-by: Jib <jib@byblack.us>
Thank you for contributing to LangChain!
- [ ] **PR title**: "community: deprecate vectorstores.MatchingEngine"
- [ ] **PR message**:
- **Description:** announced a deprecation since this integration has
been moved to langchain_google_vertexai
**Description:** Update docstrings of ChatAnthropic class
**Issue:** Change to ChatAnthropic from ChatAnthropicMessages
**Dependencies:** None
**Lint and test**: `make format`, `make lint` and `make test` passed
- **Description:** finishes adding the you.com functionality including:
- add async functions to utility and retriever
- add the You.com Tool
- add async testing for utility, retriever, and tool
- add a tool integration notebook page
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** @scottnath
- **Description:** add tools_renderer for various non-openai agents,
make tools can be render in different ways for your LLM.
- **Issue:** N/A
- **Dependencies:** N/A
---------
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
Description:
This pull request introduces several enhancements for Azure Cosmos
Vector DB, primarily focused on improving caching and search
capabilities using Azure Cosmos MongoDB vCore Vector DB. Here's a
summary of the changes:
- **AzureCosmosDBSemanticCache**: Added a new cache implementation
called AzureCosmosDBSemanticCache, which utilizes Azure Cosmos MongoDB
vCore Vector DB for efficient caching of semantic data. Added
comprehensive test cases for AzureCosmosDBSemanticCache to ensure its
correctness and robustness. These tests cover various scenarios and edge
cases to validate the cache's behavior.
- **HNSW Vector Search**: Added HNSW vector search functionality in the
CosmosDB Vector Search module. This enhancement enables more efficient
and accurate vector searches by utilizing the HNSW (Hierarchical
Navigable Small World) algorithm. Added corresponding test cases to
validate the HNSW vector search functionality in both
AzureCosmosDBSemanticCache and AzureCosmosDBVectorSearch. These tests
ensure the correctness and performance of the HNSW search algorithm.
- **LLM Caching Notebook** - The notebook now includes a comprehensive
example showcasing the usage of the AzureCosmosDBSemanticCache. This
example highlights how the cache can be employed to efficiently store
and retrieve semantic data. Additionally, the example provides default
values for all parameters used within the AzureCosmosDBSemanticCache,
ensuring clarity and ease of understanding for users who are new to the
cache implementation.
@hwchase17,@baskaryan, @eyurtsev,