Commit Graph

8459 Commits (ec7a59c96c5303986f29e513d0eb64541cae359b)
 

Author SHA1 Message Date
BeatrixCohere d1a2e194c3
cohere[patch]: misc fixs tool use agent and cohere chat (#19705)
Bug fixes in this PR:
* allows for other params such as "message" not just the input param to
the prompt for the cohere tools agent
* fixes to documents kwarg from messages
* fixes to tool_calls API call

---------

Co-authored-by: Harry M <127103098+harry-cohere@users.noreply.github.com>
5 months ago
ccurme b35e68c41f
docs: update use_cases/question_answering/chat_history (#19349)
Update following https://github.com/langchain-ai/langchain/issues/19344
5 months ago
Erick Friis 8c2ed85a45
core[patch], infra: release 0.1.36, run partner CI on core PRs (#19688) 5 months ago
Erick Friis 5327bc9ec4
elasticsearch[patch]: move to repo (#19620) 5 months ago
Nilanjan De 239dd7c0c0
langchain[patch]: Use map() and avoid "ValueError: max() arg is an empty sequence" in MergerRetriever (#18679)
- **Issue:** When passing an empty list to MergerRetriever it fails with
error: ValueError: max() arg is an empty sequence

- **Description:** We have a use case where we dynamically select
retrievers and use MergerRetriever for merging the output of the
retrievers. We faced this issue when the retriever_docs list is empty.
Adding a default 0 for cases when retriever_docs is an empty list to
avoid "ValueError: max() arg is an empty sequence". Also, changed to use
map() which is more than twice as fast compared to the current
implementation.
```
import timeit
# Sample retriever_docs with varying lengths of sublists
retriever_docs = [[i for i in range(j)] for j in range(1, 1000)]
# First code snippet
code1 = '''
max_docs = max(len(docs) for docs in retriever_docs)
'''
# Second code snippet
code2 = '''
max_docs = max(map(len, retriever_docs), default=0)
'''
# Benchmarking
time1 = timeit.timeit(stmt=code1, globals=globals(), number=10000)
time2 = timeit.timeit(stmt=code2, globals=globals(), number=10000)
# Output
print(f"Execution time for code snippet 1: {time1} seconds")
print(f"Execution time for code snippet 2: {time2} seconds")
```

- **Dependencies:** none
5 months ago
aditya thomas 4cd38fe89f
docs: update docstring of the ChatGroq class (#18645)
**Description:** Update docstring of the ChatGroq class
**Issue:** Not applicable
**Dependencies:** None
5 months ago
Jaid e4d7b1a482
voyageai[patch]: top level reranker import (#19645)
The previous version didn't had  Voyage rerank in the init file


- [ ] **PR title**: langchain_voyageai reranker is not working
 


- [ ] **PR message**: 
    - **Description:** This fix let you run reranker from voyage
    - **Issue:** Was not able to run reranker from voyage
  






 @efriis
5 months ago
Xinwei Xiong 26eed70c11
infra: Optimize Makefile for Better Usability and Maintenance (#18859)
**Previous screenshots:**


![image](https://github.com/langchain-ai/langchain/assets/86140903/e2f326e3-4d97-4b22-aacb-e789a9d815e4)

**Current screenshot:**

![image](https://github.com/langchain-ai/langchain/assets/86140903/bd8a3ea7-1b8a-4803-9168-df45f6fa4893)
5 months ago
Juan Jose Miguel Ovalle Villamil 51baa1b5cf
langchain[patch]: fix-cohere-reranker-rerank-method with cohere v5 (#19486)
#### Description
Fixed the following error with `rerank` method from `CohereRerank`:
```
---> [79](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jjmov99/legal-colombia/~/legal-colombia/.venv/lib/python3.11/site-packages/langchain/retrievers/document_compressors/cohere_rerank.py:79) results = self.client.rerank(
     [80](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jjmov99/legal-colombia/~/legal-colombia/.venv/lib/python3.11/site-packages/langchain/retrievers/document_compressors/cohere_rerank.py:80)     query, docs, model, top_n=top_n, max_chunks_per_doc=max_chunks_per_doc
     [81](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jjmov99/legal-colombia/~/legal-colombia/.venv/lib/python3.11/site-packages/langchain/retrievers/document_compressors/cohere_rerank.py:81) )
     [82](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jjmov99/legal-colombia/~/legal-colombia/.venv/lib/python3.11/site-packages/langchain/retrievers/document_compressors/cohere_rerank.py:82) result_dicts = []
     [83](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jjmov99/legal-colombia/~/legal-colombia/.venv/lib/python3.11/site-packages/langchain/retrievers/document_compressors/cohere_rerank.py:83) for res in results.results:

TypeError: BaseCohere.rerank() takes 1 positional argument but 4 positional arguments (and 2 keyword-only arguments) were given
```
This was easily fixed going from this:
```
   def rerank(
        self,
        documents: Sequence[Union[str, Document, dict]],
        query: str,
        *,
        model: Optional[str] = None,
        top_n: Optional[int] = -1,
        max_chunks_per_doc: Optional[int] = None,
    ) -> List[Dict[str, Any]]:
         ...
        if len(documents) == 0:  # to avoid empty api call
            return []
        docs = [
            doc.page_content if isinstance(doc, Document) else doc for doc in documents
        ]
        model = model or self.model
        top_n = top_n if (top_n is None or top_n > 0) else self.top_n
        results = self.client.rerank(
            query, docs, model, top_n=top_n, max_chunks_per_doc=max_chunks_per_doc
        )
        result_dicts = []
        for res in results:
            result_dicts.append(
                {"index": res.index, "relevance_score": res.relevance_score}
            )
        return result_dicts
```
to this:
```
    def rerank(
        self,
        documents: Sequence[Union[str, Document, dict]],
        query: str,
        *,
        model: Optional[str] = None,
        top_n: Optional[int] = -1,
        max_chunks_per_doc: Optional[int] = None,
    ) -> List[Dict[str, Any]]:
         ...
        if len(documents) == 0:  # to avoid empty api call
            return []
        docs = [
            doc.page_content if isinstance(doc, Document) else doc for doc in documents
        ]
        model = model or self.model
        top_n = top_n if (top_n is None or top_n > 0) else self.top_n
        results = self.client.rerank(
            query=query, documents=docs, model=model, top_n=top_n, max_chunks_per_doc=max_chunks_per_doc <-------------
        )
        result_dicts = []
        for res in results.results:  <-------------
            result_dicts.append(
                {"index": res.index, "relevance_score": res.relevance_score}
            )
        return result_dicts
```
#### Unit & Integration tests
I added a unit test to check the behaviour of `rerank`. Also fixed the
original integration test which was failing.

#### Format & Linting
Everything worked properly with `make lint_diff`, `make format_diff` and
`make format`. However I noticed an error coming from other part of the
library when doing `make lint`:

```
(langchain-py3.9) ➜  langchain git:(master) make format
[ "." = "" ] || poetry run ruff format .
1636 files left unchanged
[ "." = "" ] || poetry run ruff --select I --fix .
(langchain-py3.9) ➜  langchain git:(master) make lint
./scripts/check_pydantic.sh .
./scripts/lint_imports.sh
poetry run ruff .
[ "." = "" ] || poetry run ruff format . --diff
1636 files already formatted
[ "." = "" ] || poetry run ruff --select I .
[ "." = "" ] || mkdir -p .mypy_cache && poetry run mypy . --cache-dir .mypy_cache
langchain/agents/openai_assistant/base.py:252: error: Argument "file_ids" to "create" of "Assistants" has incompatible type "Optional[Any]"; expected "Union[list[str], NotGiven]"  [arg-type]
langchain/agents/openai_assistant/base.py:374: error: Argument "file_ids" to "create" of "AsyncAssistants" has incompatible type "Optional[Any]"; expected "Union[list[str], NotGiven]"  [arg-type]
Found 2 errors in 1 file (checked 1634 source files)
make: *** [Makefile:65: lint] Error 1
```

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Shuqian 332996b4b2
openai[patch]: fix ChatOpenAI model's openai proxy (#19559)
Due to changes in the OpenAI SDK, the previous method of setting the
OpenAI proxy in ChatOpenAI no longer works. This PR fixes this issue,
making the previous way of setting the OpenAI proxy in ChatOpenAI
effective again.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Bagatur b15c7fdde6
anthropic[patch]: fix response metadata type (#19683) 5 months ago
kaijietti 9c4b6dc979
community[patch]: fix bug in cohere that `async for` a coroutine in ChatCohere (#19381)
Without `await`, the `stream` returned from the `async_client` is
actually a coroutine, which could not be used in `async for`.
5 months ago
Christian Galo 1adaa3c662
community[minor]: Update Azure Cognitive Services to Azure AI Services (#19488)
This is a follow up to #18371. These are the changes:
- New **Azure AI Services** toolkit and tools to replace those of
**Azure Cognitive Services**.
- Updated documentation for Microsoft platform.
- The image analysis tool has been rewritten to use the new package
`azure-ai-vision-imageanalysis`, doing a proper replacement of
`azure-ai-vision`.

These changes:
- Update outdated naming from "Azure Cognitive Services" to "Azure AI
Services".
- Update documentation to use non-deprecated methods to create and use
agents.
- Removes need to depend on yanked python package (`azure-ai-vision`)

There is one new dependency that is needed as a replacement to
`azure-ai-vision`:
- `azure-ai-vision-imageanalysis`. This is optional and declared within
a function.

There is a new `azure_ai_services.ipynb` notebook showing usage; Changes
have been linted and formatted.

I am leaving the actions of adding deprecation notices and future
removal of Azure Cognitive Services up to the LangChain team, as I am
not sure what the current practice around this is.

---

If this PR makes it, my handle is  @galo@mastodon.social

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
5 months ago
Shengsheng Huang ac1dd8ad94
community[minor]: migrate `bigdl-llm` to `ipex-llm` (#19518)
- **Description**: `bigdl-llm` library has been renamed to
[`ipex-llm`](https://github.com/intel-analytics/ipex-llm). This PR
migrates the `bigdl-llm` integration to `ipex-llm` .
- **Issue**: N/A. The original PR of `bigdl-llm` is
https://github.com/langchain-ai/langchain/pull/17953
- **Dependencies**: `ipex-llm` library
- **Contribution maintainer**: @shane-huang

Updated doc:   docs/docs/integrations/llms/ipex_llm.ipynb
Updated test:
libs/community/tests/integration_tests/llms/test_ipex_llm.py
5 months ago
Chaunte W. Lacewell a31f692f4e
community[minor]: Add VDMS vectorstore (#19551)
- **Description:** Add support for Intel Lab's [Visual Data Management
System (VDMS)](https://github.com/IntelLabs/vdms) as a vector store
- **Dependencies:** `vdms` library which requires protobuf = "4.24.2".
There is a conflict with dashvector in `langchain` package but conflict
is resolved in `community`.
- **Contribution maintainer:** [@cwlacewe](https://github.com/cwlacewe)
- **Added tests:**
libs/community/tests/integration_tests/vectorstores/test_vdms.py
- **Added docs:** docs/docs/integrations/vectorstores/vdms.ipynb
- **Added cookbook:** cookbook/multi_modal_RAG_vdms.ipynb

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
William FH b7b62e29fb
community[patch], mongodb[patch]: Stop spamming SIMD import warnings (#19531)
If you use an embedding dist function in an eval loop, you get warned
every time. Would prefer to just check once and forget about it.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Tomaz Bratanic b04e663426
experimental[patch]: Flatten relationships in LLM graph transformer (#19642) 5 months ago
billytrend-cohere 36abb5dd41
cohere[patch]: Fix positional argument (#19678)
cohere: Fix positional argument

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago
Nuno Campos fdfb51ad8d
core: Two updates to chat model interface (#19684)
- .stream() and .astream() call on_llm_new_token, removing the need for
subclasses to do so. Backwards compatible because now we don't pass
run_manager into ._stream and ._astream
- .generate() and .agenerate() now handle `stream: bool` kwarg for
_generate and _agenerate. Subclasses handle this arg by delegating to
._stream(), now one less thing they need to do. Backwards compat because
this is an optional arg that we now never pass to the subclasses
- .generate() and .agenerate() now inspect callback handlers to decide
on a default value for stream:bool if not passed in. This auto enables
streaming when using astream_events and astream_log
- as a result of these three changes any usage of .astream_events and
.astream_log should now yield chat model stream events
- In future PRs we can update all subclasses to reflect these two things
now handled by base class, but in meantime all will continue to work
5 months ago
harry-cohere 3685f8ceac
cohere[patch]: Add cohere tools agent (#19602)
**Description**: Adds a cohere tools agent and related notebook.

---------

Co-authored-by: BeatrixCohere <128378696+BeatrixCohere@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
5 months ago
William FH 5c41f4083e
[Evals] Fix function calling support (#19658)
Current implementation is overzealous in validating chat datasets

Fixes
[#langsmith-sdk:557](https://github.com/langchain-ai/langsmith-sdk/issues/557)
5 months ago
yongheng.liu 7e29b6061f
community[minor]: integrate China Mobile Ecloud vector search (#15298)
- **Description:** integrate China Mobile Ecloud vector search, 
  - **Dependencies:** elasticsearch==7.10.1

Co-authored-by: liuyongheng <liuyongheng@cmss.chinamobile.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Hyeongchan Kim 9b70131aed
community[patch]: refactor the type hint of `file_path` in `UnstructuredAPIFileLoader` class (#18839)
* **Description**: add `None` type for `file_path` along with `str` and
`List[str]` types.
* `file_path`/`filename` arguments in `get_elements_from_api()` and
`partition()` can be `None`, however, there's no `None` type hint for
`file_path` in `UnstructuredAPIFileLoader` and `UnstructuredFileLoader`
currently.
* calling the function with `file_path=None` is no problem, but my IDE
annoys me lol.
* **Issue**: N/A
* **Dependencies**: N/A

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago
CaroFG cf96060ab7
community[patch]: update for compatibility with latest Meilisearch version (#18970)
- **Description:** Updates Meilisearch vectorstore for compatibility
with v1.6 and above. Adds embedders settings and embedder_name which are
now required.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
chyroc be2adb1083
community[patch]: support unstructured_kwargs for s3 loader (#15473)
fix https://github.com/langchain-ai/langchain/issues/15472

Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Bagatur b901649032
docs: move extraction up (#19667) 5 months ago
Kahlil Wehmeyer 9c08cdea92
core[patch]: ToolException docs/exception message (#17590)
**Description:**
This PR adds a slightly more helpful message to a Tool Exception

```
# current state
langchain_core.tools.ToolException: Too many arguments to single-input tool

# proposed state
langchain_core.tools.ToolException: Too many arguments to single-input tool. Consider using a StructuredTool instead.
```
**Issue:** Somewhat discussed here 👉  #6197 
 **Dependencies:** None
**Twitter handle:** N/A

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago
Evgenii Zheltonozhskii 5b1f9c6d3a
infra: Consistent lxml requirements (#19520)
Update the dependency for lxml to be consistent among different
packages; should fix
https://github.com/langchain-ai/langchain/issues/19040
5 months ago
Filip Michalsky 2fceec3771
docs: update cookbook example for SalesGPT - include Stripe Payment Link Generation (#19622)
Thank you for contributing to LangChain!

- [ ] **cookbook** - update example for SalesGPT - include Stripe
Payment Link Generation

- **Description:** We updated the Jupyter notebook example with the
ability of the AI Agent to negotiate with customers and then close the
deal by generating a custom Stripe payment link.
    - **Issue:** N/A
    - **Dependencies:** N/a
    - **Twitter handle:** @FilipMichalsky @0xtotaylor


If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, hwchase17.

---------

Co-authored-by: Filip Michalsky <filip_michalsky@g.harvard.edu>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Christophe Bornet 33fa8cfcd0
core[minor]: Add async methods to MaxMarginalRelevanceExampleSelector (#19639) 5 months ago
Taqi Jaffri 72c8b3127d
cli[patch]: Fix typo in dev script name for the --chat-playground option on the cli (#19673)
Fixes typo

---------

Co-authored-by: Taqi Jaffri <tjaffri@docugami.com>
5 months ago
Jan Nissen 2e0ddd6fb8
core[minor]: support pydantic v2 models in PydanticOutputParser (#18811)
As mentioned in #18322, the current PydanticOutputParser won't work for
anyone trying to parse to pydantic v2 models. This PR adds a separate
`PydanticV2OutputParser`, as well as a `langchain_core.pydantic_v2`
namespace that will fail on import to any projects using pydantic<2.
Happy to update the docs for output parsers if this is something we're
interesting in adding.

On a separate note, I also updated `check_pydantic.sh` to detect
pydantic imports with leading whitespace and excluded the internal
namespaces. That change can be separated into its own PR if needed.

---------

Co-authored-by: Jan Nissen <jan23@gmail.com>
5 months ago
Kangmoon Seo d0accc3275
docs: fix error output in XMLOutputParser documentation (#19569)
- **Description:** I've made a fix to a ParseError call in the
XMLOutputParser documentation.
- **Issue:** None
- **Dependencies:** None

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
Tomaz Bratanic 87d2a6b777
community[minor]: Add the option to omit schema refresh in Neo4jGraph (#19654) 5 months ago
Bagatur 5fc6531c74
docs: use first_tool_only instead of return_single (#19666) 5 months ago
jhicks2306 bcb8ab5216
docs: Improve docstring for Runnable bind method (#19659)
Added example to the docstring of the "bind" method of Runnable. This
makes it easier to understand the purpose of the method when reviewing
in code editors. E.g. VS Code below.

<img width="833" alt="Screenshot 2024-03-27 at 16 24 18"
src="https://github.com/langchain-ai/langchain/assets/45722942/ad022d4e-7bc0-4f4b-aa7a-838f1816cc52">

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
5 months ago
ccurme 4e9b358ed8
docs: Fix broken imports in documentation (#19655)
Found via script in https://github.com/langchain-ai/langchain/pull/19611
5 months ago
Rajendra Kadam 0019d8a948
community[minor]: Add support for non-file-based Document Loaders in PebbloSafeLoader (#19574)
**Description:**
PebbloSafeLoader: Add support for non-file-based Document Loaders

This pull request enhances PebbloSafeLoader by introducing support for
several non-file-based Document Loaders. With this update,
PebbloSafeLoader now seamlessly integrates with the following loaders:
- GoogleDriveLoader
- SlackDirectoryLoader
- Unstructured EmailLoader

**Issue:** NA
**Dependencies:** - None
**Twitter handle:** @Raj__725

---------

Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
5 months ago
Christophe Bornet 9954c6a38e
langchain[minor]: Add async methods to EncoderBackedStore (#19597)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
5 months ago
Erick Friis 929ed65554
cohere[patch]: release 0.1.0rc1 (#19663) 5 months ago
hulitaitai dc2c9dd4d7
Update text2vec.py (#19657)
Add that URL of the embedding tool "text2vec".
Fix minor mistakes in the doc-string.
5 months ago
Erick Friis 7630e9529c
Revert "community: added `partners/package-name` folders" (#19662)
Reverts langchain-ai/langchain#19290
5 months ago
Christophe Bornet 409c6eeb0b
core: Add async methods to LengthBasedExampleSelector (#19640) 5 months ago
Bagatur c7f1962f73
core[patch]: Release 0.1.35 (#19660) 5 months ago
Eugene Yurtsev e8339b1d83
core[patch]: Patch XML vulnerability in XMLOutputParser (CVE-2024-1455) (#19653)
Patch potential XML vulnerability CVE-2024-1455

This patches a potential XML vulnerability in the XMLOutputParser in
langchain-core. The vulnerability in some situations could lead to a
denial of service attack.

At risk are users that:

1) Running older distributions of python that have older version of
libexpat
2) Are using XMLOutputParser with an agent
3) Accept inputs from untrusted sources with this agent (e.g., endpoint
on the web that allows an untrusted user to interact wiith the parser)
5 months ago
Guangdong Liu 7042934b5f
community[patch]: Fix the bug that Chroma does not specify `embedding_function` (#19277)
- **Issue:** close #18291
- @baskaryan, @eyurtsev PTAL
5 months ago
billytrend-cohere 85f57ab4cd
cohere[patch]: Fix cohere rerank (#19624)
Fix cohere rerank inspired by
https://github.com/langchain-ai/langchain/pull/19486
5 months ago
Eugene Yurtsev 8ab7bb3166
core[patch]: XMLOutputParser fix to handle changes to xml standard library (#19612)
Newest python micro releases broke streaming in the XMLOutputParser. This fixes the parsing code to work with trailing junk after the XML content.
5 months ago
yuwenzho 3a7d2cf443
community[minor]: Add ITREX optimized Embeddings (#18474)
Introduction
[Intel® Extension for
Transformers](https://github.com/intel/intel-extension-for-transformers)
is an innovative toolkit designed to accelerate GenAI/LLM everywhere
with the optimal performance of Transformer-based models on various
Intel platforms

Description

adding ITREX runtime embeddings using intel-extension-for-transformers.
added mdx documentation and example notebooks
added embedding import testing.

---------

Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
5 months ago
Juan Jose Miguel Ovalle Villamil 1fe10a3e3d
experimental[patch]: Enhance LLMGraphTransformer with async processing and improved readability (#19205)
- [x] **PR title**: "experimental: Enhance LLMGraphTransformer with
async processing and improved readability"


- [x] **PR message**: 
- **Description:** This pull request refactors the `process_response`
and `convert_to_graph_documents` methods in the LLMGraphTransformer
class to improve code readability and adds async versions of these
methods for concurrent processing.
    The main changes include:
- Simplifying list comprehensions and conditional logic in the
process_response method for better readability.
- Adding async versions aprocess_response and
aconvert_to_graph_documents to enable concurrent processing of
documents.
These enhancements aim to improve the overall efficiency and
maintainability of the `LLMGraphTransformer` class.
  - **Issue:** N/A
  - **Dependencies:** No additional dependencies required.
  - **Twitter handle:** @jjovalle99


- [x] **Add tests and docs**: N/A (This PR does not introduce a new
integration)


- [x] **Lint and test**: Ran make format, make lint, and make test from
the root of the modified package(s). All tests pass successfully.

Additional notes:

- The changes made in this PR are backwards compatible and do not
introduce any breaking changes.
- The PR touches only the `LLMGraphTransformer` class within the
experimental package.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
5 months ago