Commit Graph

573 Commits (97de498d393cb6687421ee3f61d85c86098cf5f5)

Author SHA1 Message Date
Christophe Bornet 5985454269
Merge pull request #18539
* Implement lazy_load() for GitLoader
7 months ago
Christophe Bornet 9a6f7e213b
Merge pull request #18423
* Implement lazy_load() for BSHTMLLoader
7 months ago
Christophe Bornet b3a0c44838
Merge pull request #18673
* Implement lazy_load() for PDFMinerPDFasHTMLLoader and PyMuPDFLoader
7 months ago
Christophe Bornet 68fc0cf909
Merge pull request #18674
* Implement lazy_load() for TextLoader
7 months ago
Christophe Bornet 5b92f962f1
Merge pull request #18671
* Implement lazy_load() for MastodonTootsLoader
7 months ago
Christophe Bornet 15b1770326
Merge pull request #18421
* Implement lazy_load() for AssemblyAIAudioTranscriptLoader
7 months ago
Christophe Bornet bb284eebe4
Merge pull request #18436
* Implement lazy_load() for ConfluenceLoader
7 months ago
Christophe Bornet 691480f491
Merge pull request #18647
* Implement lazy_load() for UnstructuredBaseLoader
7 months ago
Christophe Bornet 52ac67c5d8
Merge pull request #18654
* Implement lazy_load() for ObsidianLoader
7 months ago
Christophe Bornet b9c0cf9025
Merge pull request #18656
* Implement lazy_load() for PsychicLoader
7 months ago
Christophe Bornet aa7ac57b67
community: Implement lazy_load() for TrelloLoader (#18658)
Covered by `tests/unit_tests/document_loaders/test_trello.py`
7 months ago
Christophe Bornet 302985fea1
community: Implement lazy_load() for SlackDirectoryLoader (#18675)
Integration tests:
`tests/integration_tests/document_loaders/test_slack.py`
7 months ago
Christophe Bornet ed36f9f604
community: Implement lazy_load() for WhatsAppChatLoader (#18677)
Integration test:
`tests/integration_tests/document_loaders/test_whatsapp_chat.py`
7 months ago
Christophe Bornet f414f5cdb9
community[minor]: Implement lazy_load() for WikipediaLoader (#18680)
Integration test:
`tests/integration_tests/document_loaders/test_wikipedia.py`
7 months ago
Bagatur 4cbfeeb1c2
community[patch]: Release 0.0.26 (#18683) 7 months ago
Christophe Bornet 1100f8de7a
community[minor]: Implement lazy_load() for ArxivLoader (#18664)
Integration tests: `tests/integration_tests/utilities/test_arxiv.py` and
`tests/integration_tests/document_loaders/test_arxiv.py`
7 months ago
Christophe Bornet 2d96803ddd
community[minor]: Implement lazy_load() for OutlookMessageLoader (#18668)
Integration test:
`tests/integration_tests/document_loaders/test_email.py`
7 months ago
Christophe Bornet ae167fb5b2
community[minor]: Implement lazy_load() for SitemapLoader (#18667)
Integration tests: `test_sitemap.py` and `test_docusaurus.py`
7 months ago
Christophe Bornet 623dfcc55c
community[minor]: Implement lazy_load() for FacebookChatLoader (#18669)
Integration test:
`tests/integration_tests/document_loaders/test_facebook_chat.py`
7 months ago
Christophe Bornet 20794bb889
community[minor]: Implement lazy_load() for GitbookLoader (#18670)
Integration test:
`tests/integration_tests/document_loaders/test_gitbook.py`
7 months ago
Liang Zhang 81985b31e6
community[patch]: Databricks SerDe uses cloudpickle instead of pickle (#18607)
- **Description:** Databricks SerDe uses cloudpickle instead of pickle
when serializing a user-defined function transform_input_fn since pickle
does not support functions defined in `__main__`, and cloudpickle
supports this.
- **Dependencies:** cloudpickle>=2.0.0

Added a unit test.
7 months ago
Christophe Bornet 7d6de96186
community[patch]: Implement lazy_load() for CubeSemanticLoader (#18535)
Covered by `test_cube_semantic.py`
7 months ago
Christophe Bornet a6b5d45e31
community[patch]: Implement lazy_load() for EverNoteLoader (#18538)
Covered by `test_evernote_loader.py`
7 months ago
Sunchao Wang dc81dba6cf
community[patch]: Improve amadeus tool and doc (#18509)
Description:

This pull request addresses two key improvements to the langchain
repository:

**Fix for Crash in Flight Search Interface**:

Previously, the code would crash when encountering a failure scenario in
the flight ticket search interface. This PR resolves this issue by
implementing a fix to handle such scenarios gracefully. Now, the code
handles failures in the flight search interface without crashing,
ensuring smoother operation.

**Documentation Update for Amadeus Toolkit**:

Prior to this update, examples provided in the documentation for the
Amadeus Toolkit were unable to run correctly due to outdated
information. This PR includes an update to the documentation, ensuring
that all examples can now be executed successfully. With this update,
users can effectively utilize the Amadeus Toolkit with accurate and
functioning examples.
These changes aim to enhance the reliability and usability of the
langchain repository by addressing issues related to error handling and
ensuring that documentation remains up-to-date and actionable.

Issue: https://github.com/langchain-ai/langchain/issues/17375

Twitter Handle: SingletonYxx
7 months ago
Christophe Bornet f77f7dc3ec
community[patch]: Fix VectorStoreQATool (#18529)
Fix #18460
7 months ago
Dounx ad48f55357
community[minor]: add Yuque document loader (#17924)
This pull request support loading documents from Yuque with Langchain.

Yuque is a professional cloud-based knowledge base for team
collaboration in documentation.

Website: https://www.yuque.com
OpenAPI: https://www.yuque.com/yuque/developer/openapi
7 months ago
Kazuki Maeda 60c5d964a8
community[minor]: use jq schema for content_key in json_loader (#18003)
### Description
Changed the value specified for `content_key` in JSONLoader from a
single key to a value based on jq schema.
I created [similar
PR](https://github.com/langchain-ai/langchain/pull/11255) before, but it
has several conflicts because of the architectural change associated
stable version release, so I re-create this PR to fit new architecture.

### Why
For json data like the following, specify `.data[].attributes.message`
for page_content and `.data[].attributes.id` or
`.data[].attributes.attributes. tags`, etc., the `content_key` must also
parse the json structure.

<details>
<summary>sample json data</summary>

```json
{
  "data": [
    {
      "attributes": {
        "message": "message1",
        "tags": [
          "tag1"
        ]
      },
      "id": "1"
    },
    {
      "attributes": {
        "message": "message2",
        "tags": [
          "tag2"
        ]
      },
      "id": "2"
    }
  ]
}
```

</details>

<details>
<summary>sample code</summary>

```python
def metadata_func(record: dict, metadata: dict) -> dict:

    metadata["source"] = None
    metadata["id"] = record.get("id")
    metadata["tags"] = record["attributes"].get("tags")

    return metadata

sample_file = "sample1.json"
loader = JSONLoader(
    file_path=sample_file,
    jq_schema=".data[]",
    content_key=".attributes.message", ## content_key is parsable into jq schema
    is_content_key_jq_parsable=True, ## this is added parameter
    metadata_func=metadata_func
)

data = loader.load()
data
```

</details>

### Dependencies
none

### Twitter handle
[kzk_maeda](https://twitter.com/kzk_maeda)
7 months ago
Hech 6a08134661
community[patch], langchain[minor]: Add retriever self_query and score_threshold in DingoDB (#18106) 7 months ago
Yudhajit Sinha 4570b477b9
community[patch]: Invoke callback prior to yielding token (titan_takeoff) (#18560)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream_
method in llms/titan_takeoff.
- Issue: #16913 
- Dependencies: None
7 months ago
Tomaz Bratanic ea51cdaede
Remove neo4j bloom labels from graph schema (#18564)
Neo4j tools use particular node labels and relationship types to store
metadata, but are irrelevant for text2cypher or graph generation, so we
want to ignore them in the schema representation.
7 months ago
Erick Friis e1924b3e93
core[patch]: deprecate hwchase17/langchain-hub, address path traversal (#18600)
Deprecates the old langchain-hub repository. Does *not* deprecate the
new https://smith.langchain.com/hub

@PinkDraconian has correctly raised that in the event someone is loading
unsanitized user input into the `try_load_from_hub` function, they have
the ability to load files from other locations in github than the
hwchase17/langchain-hub repository.

This PR adds some more path checking to that function and deprecates the
functionality in favor of the hub built into LangSmith.
7 months ago
Jib 9da1e0cf34
mongodb[patch]: Migrate MongoDBChatMessageHistory (#18590)
## **Description** 
Migrate the `MongoDBChatMessageHistory` to the managed
`langchain-mongodb` partner-package
## **Dependencies**
None
## **Twitter handle**
@mongodb

## **tests and docs**
- [x] Migrate existing integration test
- [x ]~ Convert existing integration test to a unit test~ Creation is
out of scope for this ticket
- [x ] ~Considering delaying work until #17470 merges to leverage the
`MockCollection` object. ~
- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago
Tomaz Bratanic 353248838d
Add precedence for input params over env variables in neo4j integration (#18581)
input parameters take precedence over env variables
7 months ago
Christophe Bornet c8a171a154
community: Implement lazy_load() for GithubFileLoader (#18584) 7 months ago
Leonid Kuligin 04d134df17
marked MatchingEngine as deprecated (#18585)
Thank you for contributing to LangChain!

- [ ] **PR title**: "community: deprecate vectorstores.MatchingEngine"


- [ ] **PR message**: 
- **Description:** announced a deprecation since this integration has
been moved to langchain_google_vertexai
7 months ago
Erick Friis 343438e872
community[patch]: deprecate community fireworks (#18544) 7 months ago
Scott Nath b051bba1a9
community: Add you.com tool, add async to retriever, add async testing, add You tool doc (#18032)
- **Description:** finishes adding the you.com functionality including:
    - add async functions to utility and retriever
    - add the You.com Tool
    - add async testing for utility, retriever, and tool
    - add a tool integration notebook page
- **Dependencies:** any dependencies required for this change
- **Twitter handle:** @scottnath
7 months ago
William De Vena 275877980e
community[patch]: Invoke callback prior to yielding token (#18447)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
Description: Invoke callback prior to yielding token in _stream method
in llms/vertexai.
Issue: https://github.com/langchain-ai/langchain/issues/16913
Dependencies: None
7 months ago
William De Vena 67375e96e0
community[patch]: Invoke callback prior to yielding token (#18448)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream method
in llms/tongyi.
- Issue: https://github.com/langchain-ai/langchain/issues/16913
- Dependencies: None
7 months ago
William De Vena 2087cbae64
community[patch]: Invoke callback prior to yielding token (#18449)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream method
in chat_models/perplexity.
- Issue: https://github.com/langchain-ai/langchain/issues/16913
- Dependencies: None
7 months ago
William De Vena eb04d0d3e2
community[patch]: Invoke callback prior to yielding token (#18452)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream and
_astream methods in llms/anthropic.
- Issue: https://github.com/langchain-ai/langchain/issues/16913
- Dependencies: None
7 months ago
William De Vena 371bec79bc
community[patch]: Invoke callback prior to yielding token (#18454)
## PR title
community[patch]: Invoke callback prior to yielding token

## PR message
- Description: Invoke callback prior to yielding token in _stream and
_astream methods in llms/baidu_qianfan_endpoint.
- Issue: https://github.com/langchain-ai/langchain/issues/16913
- Dependencies: None
7 months ago
Aayush Kataria 7c2f3f6f95
community[minor]: Adding Azure Cosmos Mongo vCore Vector DB Cache (#16856)
Description:

This pull request introduces several enhancements for Azure Cosmos
Vector DB, primarily focused on improving caching and search
capabilities using Azure Cosmos MongoDB vCore Vector DB. Here's a
summary of the changes:

- **AzureCosmosDBSemanticCache**: Added a new cache implementation
called AzureCosmosDBSemanticCache, which utilizes Azure Cosmos MongoDB
vCore Vector DB for efficient caching of semantic data. Added
comprehensive test cases for AzureCosmosDBSemanticCache to ensure its
correctness and robustness. These tests cover various scenarios and edge
cases to validate the cache's behavior.
- **HNSW Vector Search**: Added HNSW vector search functionality in the
CosmosDB Vector Search module. This enhancement enables more efficient
and accurate vector searches by utilizing the HNSW (Hierarchical
Navigable Small World) algorithm. Added corresponding test cases to
validate the HNSW vector search functionality in both
AzureCosmosDBSemanticCache and AzureCosmosDBVectorSearch. These tests
ensure the correctness and performance of the HNSW search algorithm.
- **LLM Caching Notebook** - The notebook now includes a comprehensive
example showcasing the usage of the AzureCosmosDBSemanticCache. This
example highlights how the cache can be employed to efficiently store
and retrieve semantic data. Additionally, the example provides default
values for all parameters used within the AzureCosmosDBSemanticCache,
ensuring clarity and ease of understanding for users who are new to the
cache implementation.
 
 @hwchase17,@baskaryan, @eyurtsev,
7 months ago
Erick Friis 1fd1ac8e95
community[patch]: release 0.0.25 (#18408) 7 months ago
Sourav Pradhan 50abeb7ed9
community[patch]: fix Chroma add_images (#17964)
###  Description

Fixed a small bug in chroma.py add_images(), previously whenever we are
not passing metadata the documents is containing the base64 of the uris
passed, but when we are passing the metadata the documents is containing
normal string uris which should not be the case.

### Issue

In add_images() method when we are calling upsert() we have to use
"b64_texts" instead of normal string "uris".

### Twitter handle

https://twitter.com/whitepegasus01
7 months ago
Kate Silverstein b7c71e2e07
community[minor]: llamafile embeddings support (#17976)
* **Description:** adds `LlamafileEmbeddings` class implementation for
generating embeddings using
[llamafile](https://github.com/Mozilla-Ocho/llamafile)-based models.
Includes related unit tests and notebook showing example usage.
* **Issue:** N/A
* **Dependencies:** N/A
7 months ago
Tomaz Bratanic f6bfb969ba
community[patch]: Add an option for indexed generic label when import neo4j graph documents (#18122)
Current implementation doesn't have an indexed property that would
optimize the import. I have added a `baseEntityLabel` parameter that
allows you to add a secondary node label, which has an indexed id
`property`. By default, the behaviour is identical to previous version.

Since multi-labeled nodes are terrible for text2cypher, I removed the
secondary label from schema representation object and string, which is
used in text2cypher.
7 months ago
Arun Sathiya 4adac20d7b
community[patch]: Make cohere_api_key a SecretStr (#12188)
This PR makes `cohere_api_key` in `llms/cohere` a SecretStr, so that the
API Key is not leaked when `Cohere.cohere_api_key` is represented as a
string.

---------

Signed-off-by: Arun <arun@arun.blog>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
7 months ago
Petteri Johansson 6c1989d292
community[minor], langchain[minor], docs: Gremlin Graph Store and QA Chain (#17683)
- **Description:** 
New feature: Gremlin graph-store and QA chain (including docs).
Compatible with Azure CosmosDB.
  - **Dependencies:** 
  no changes
7 months ago
Ather Fawaz a5ccf5d33c
community[minor]: Add support for Perplexity chat model(#17024)
- **Description:** This PR adds support for [Perplexity AI
APIs](https://blog.perplexity.ai/blog/introducing-pplx-api).
  - **Issues:** None
  - **Dependencies:** None
  - **Twitter handle:** [@atherfawaz](https://twitter.com/AtherFawaz)

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
7 months ago