Commit Graph

10516 Commits

Author SHA1 Message Date
ccurme
a197a8e184
openai[patch]: move test (#24552)
No-override tests (https://github.com/langchain-ai/langchain/pull/24407)
include a condition that integrations not implement additional tests.
2024-07-23 10:22:22 -04:00
Eugene Yurtsev
0bb54ab9f0
CI: Temporarily disable min version checking on pull request (#24551)
Short term to fix CI
2024-07-23 14:12:08 +00:00
Eugene Yurtsev
f47b4edcc2
standard-test: Fix typo in skipif for chat model integration tests (#24553) 2024-07-23 10:11:01 -04:00
Jesse Wright
837a3d400b
chore(docs): SQARQL -> SPARQL typo fix (#24536)
nit picky typo fix
2024-07-23 13:39:34 +00:00
Eugene Yurtsev
20b72a044c
standard-tests: Add BaseModel variations tests to with_structured_output (#24527)
After this standard tests will test with the following combinations:

1. pydantic.BaseModel
2. pydantic.v1.BaseModel

If ran within a matrix, it'll covert both pydantic.BaseModel originating
from
pydantic 1 and the one defined in pydantic 2.
2024-07-23 09:01:26 -04:00
Bagatur
70c71efcab
core[patch]: merge_content fix (#24526) 2024-07-22 22:20:22 -07:00
Ben Chambers
a5a3d28776
community[patch]: Remove targets_table from C* GraphVectorStore (#24502)
- **Description:** Remove the unnecessary `targets_table` parameter
2024-07-22 22:09:36 -04:00
Alexander Golodkov
2a70a07aad
community[minor]: added new document loaders based on dedoc library (#24303)
### Description
This pull request added new document loaders to load documents of
various formats using [Dedoc](https://github.com/ispras/dedoc):
  - `DedocFileLoader` (determine file types automatically and parse)
  - `DedocPDFLoader` (for `PDF` and images parsing)
- `DedocAPIFileLoader` (determine file types automatically and parse
using Dedoc API without library installation)

[Dedoc](https://dedoc.readthedocs.io) is an open-source library/service
that extracts texts, tables, attached files and document structure
(e.g., titles, list items, etc.) from files of various formats. The
library is actively developed and maintained by a group of developers.

`Dedoc` supports `DOCX`, `XLSX`, `PPTX`, `EML`, `HTML`, `PDF`, images
and more.
Full list of supported formats can be found
[here](https://dedoc.readthedocs.io/en/latest/#id1).
For `PDF` documents, `Dedoc` allows to determine textual layer
correctness and split the document into paragraphs.


### Issue
This pull request extends variety of document loaders supported by
`langchain_community` allowing users to choose the most suitable option
for raw documents parsing.

### Dependencies
The PR added a new (optional) dependency `dedoc>=2.2.5` ([library
documentation](https://dedoc.readthedocs.io)) to the
`extended_testing_deps.txt`

### Twitter handle
None

### Add tests and docs
1. Test for the integration:
`libs/community/tests/integration_tests/document_loaders/test_dedoc.py`
2. Example notebook:
`docs/docs/integrations/document_loaders/dedoc.ipynb`
3. Information about the library:
`docs/docs/integrations/providers/dedoc.mdx`

### Lint and test

Done locally:

  - `make format`
  - `make lint`
  - `make integration_tests`
  - `make docs_build` (from the project root)

---------

Co-authored-by: Nasty <bogatenkova.anastasiya@mail.ru>
2024-07-23 02:04:53 +00:00
Ben Chambers
5ac936a284
community[minor]: add document transformer for extracting links (#24186)
- **Description:** Add a DocumentTransformer for executing one or more
`LinkExtractor`s and adding the extracted links to each document.
- **Issue:** n/a
- **Depedencies:** none

---------

Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-07-22 22:01:21 -04:00
Jacob Lee
3c4652c906
docs[patch]: Hide OllamaFunctions now that Ollama supports tool calling (#24523) 2024-07-22 17:56:51 -07:00
Erick Friis
2c6b9e8771
standard-tests: add override check (#24407) 2024-07-22 23:38:01 +00:00
Nithish Raghunandanan
1639ccfd15
couchbase: [patch] Return chat message history in order (#24498)
**Description:** Fixes an issue where the chat message history was not
returned in order. Fixed it now by returning based on timestamps.

- [x] **Add tests and docs**: Updated the tests to check the order
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [x] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

---------

Co-authored-by: Nithish Raghunandanan <nithishr@users.noreply.github.com>
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-07-22 23:30:29 +00:00
C K Ashby
ab036c1a4c
docs: Update .run() to .invoke() (#24520) 2024-07-22 14:21:33 -07:00
Erick Friis
3dce2e1d35
all: add release notes to pypi (#24519) 2024-07-22 13:59:13 -07:00
Bagatur
c48e99e7f2
docs: fix sql db note (#24505) 2024-07-22 13:30:29 -07:00
Bagatur
8a140ee77c
core[patch]: don't serialize BasePromptTemplate.input_types (#24516)
Candidate fix for #24513
2024-07-22 13:30:16 -07:00
MarkYQJ
df357f82ca
ignore the first turn to apply "history" mechanism (#14118)
This will generate a meaningless string "system: " for generating
condense question; this increases the probability to make an improper
condense question and misunderstand user's question. Below is a case
- Original Question: Can you explain the arguments of Meilisearch?
- Condense Question
  - What are the benefits of using Meilisearch? (by CodeLlama)
  - What are the reasons for using Meilisearch? (by GPT-4)

The condense questions (not matter from CodeLlam or GPT-4) are different
from the original one.

By checking the content of each dialogue turn, generating history string
only when the dialog content is not empty.
Since there is nothing before first turn, the "history" mechanism will
be ignored at the very first turn.

Doing so, the condense question will be "What are the arguments for
using Meilisearch?".

<!-- Thank you for contributing to LangChain!

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes (if applicable),
  - **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc:

https://github.com/langchain-ai/langchain/blob/master/.github/CONTRIBUTING.md

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-22 20:11:17 +00:00
Bagatur
236e957abb
core,groq,openai,mistralai,robocorp,fireworks,anthropic[patch]: Update BaseModel subclass and instance checks to handle both v1 and proper namespaces (#24417)
After this PR chat models will correctly handle pydantic 2 with
bind_tools and with_structured_output.


```python
import pydantic
print(pydantic.__version__)
```
2.8.2

```python
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

class Add(BaseModel):
    x: int
    y: int

model = ChatOpenAI().bind_tools([Add])
print(model.invoke('2 + 5').tool_calls)

model = ChatOpenAI().with_structured_output(Add)
print(type(model.invoke('2 + 5')))
```

```
[{'name': 'Add', 'args': {'x': 2, 'y': 5}, 'id': 'call_PNUFa4pdfNOYXxIMHc6ps2Do', 'type': 'tool_call'}]
<class '__main__.Add'>
```


```python
from langchain_openai import ChatOpenAI
from pydantic.v1 import BaseModel, Field

class Add(BaseModel):
    x: int
    y: int

model = ChatOpenAI().bind_tools([Add])
print(model.invoke('2 + 5').tool_calls)

model = ChatOpenAI().with_structured_output(Add)
print(type(model.invoke('2 + 5')))
```

```python
[{'name': 'Add', 'args': {'x': 2, 'y': 5}, 'id': 'call_hhiHYP441cp14TtrHKx3Upg0', 'type': 'tool_call'}]
<class '__main__.Add'>
```

Addresses issues: https://github.com/langchain-ai/langchain/issues/22782

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-07-22 20:07:39 +00:00
C K Ashby
199e64d372
Please spell Lex's name correctly Fridman (#24517)
https://www.youtube.com/watch?v=ZIyB9e_7a4c

Thank you for contributing to LangChain!

- [ ] **PR title**: "package: description"
- Where "package" is whichever of langchain, community, core,
experimental, etc. is being modified. Use "docs: ..." for purely docs
changes, "templates: ..." for template changes, "infra: ..." for CI
changes.
  - Example: "community: add foobar LLM"


- [ ] **PR message**: ***Delete this entire checklist*** and replace
with
    - **Description:** a description of the change
    - **Issue:** the issue # it fixes, if applicable
    - **Dependencies:** any dependencies required for this change
- **Twitter handle:** if your PR gets announced, and you'd like a
mention, we'll gladly shout you out!


- [ ] **Add tests and docs**: If you're adding a new integration, please
include
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.


- [ ] **Lint and test**: Run `make format`, `make lint` and `make test`
from the root of the package(s) you've modified. See contribution
guidelines for more: https://python.langchain.com/docs/contributing/

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-07-22 19:38:32 +00:00
Erick Friis
1f01c0fd98
infra: remove core from min version pr testing (#24507)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-07-22 17:46:15 +00:00
Naka Masato
884f76e05a
fix: load google credentials properly in GoogleDriveLoader (#12871)
- **Description:** 
- Fix #12870: set scope in `default` func (ref:
https://google-auth.readthedocs.io/en/master/reference/google.auth.html)
- Moved the code to load default credentials to the bottom for clarity
of the logic
	- Add docstring and comment for each credential loading logic
- **Issue:** https://github.com/langchain-ai/langchain/issues/12870
- **Dependencies:** no dependencies change
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** @gymnstcs

<!-- If no one reviews your PR within a few days, please @-mention one
of @baskaryan, @eyurtsev, @hwchase17.
 -->

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-22 17:43:33 +00:00
Erick Friis
a45337ea07
ollama: release 0.1.0 (#24510) 2024-07-22 10:35:26 -07:00
Isaac Francisco
1318d534af
[docs]: minor react change (#24509) 2024-07-22 10:25:01 -07:00
Jorge Piedrahita Ortiz
10e3982b59
community: sambanova integration minor changes (#24503)
- Minor changes in samabanova llm integration 
  - default api 
  - docstrings
- minor changes in docs
2024-07-22 17:06:35 +00:00
maang-h
721f709dec
community: Improve QianfanChatEndpoint tool result to model (#24466)
- **Description:** `QianfanChatEndpoint` When using tool result to
answer questions, the content of the tool is required to be in Dict
format. Of course, this can require users to return Dict format when
calling the tool, but in order to be consistent with other Chat Models,
I think such modifications are necessary.
2024-07-22 11:29:00 -04:00
Chaunte W. Lacewell
02f0a29293
Cookbook: Add Visual RAG example using VDMS (#24353)
- **Description:** Adding notebook to demonstrate visual RAG which uses
both video scene description generated by open source vision models (ex.
video-llama, video-llava etc.) as text embeddings and frames as image
embeddings to perform vector similarity search using VDMS.
  - **Issue:** N/A
  - **Dependencies:** N/A
2024-07-22 11:16:06 -04:00
ccurme
dcba7df2fe
community[patch]: deprecate langchain_community Chroma in favor of langchain_chroma (#24474) 2024-07-22 11:00:13 -04:00
ccurme
0f7569ddbc
core[patch]: enable RunnableWithMessageHistory without config (#23775)
Feedback that `RunnableWithMessageHistory` is unwieldy compared to
ConversationChain and similar legacy abstractions is common.

Legacy chains using memory typically had no explicit notion of threads
or separate sessions. To use `RunnableWithMessageHistory`, users are
forced to introduce this concept into their code. This possibly felt
like unnecessary boilerplate.

Here we enable `RunnableWithMessageHistory` to run without a config if
the `get_session_history` callable has no arguments. This enables
minimal implementations like the following:
```python
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")
memory = InMemoryChatMessageHistory()
chain = RunnableWithMessageHistory(llm, lambda: memory)

chain.invoke("Hi I'm Bob")  # Hello Bob!
chain.invoke("What is my name?")  # Your name is Bob.
```
2024-07-22 10:36:53 -04:00
Mohammad Mohtashim
5ade0187d0
[Commutiy]: Prompts Fixed for ZERO_SHOT_REACT React Agent Type in create_sql_agent function (#23693)
- **Description:** The correct Prompts for ZERO_SHOT_REACT were not
being used in the `create_sql_agent` function. They were not using the
specific `SQL_PREFIX` and `SQL_SUFFIX` prompts if client does not
provide any prompts. This is fixed.
- **Issue:** #23585

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-22 14:04:20 +00:00
ZhangShenao
0f6737cbfe
[Vector Store] Fix function add_texts in TencentVectorDB (#24469)
Regardless of whether `embedding_func` is set or not, the 'text'
attribute of document should be assigned, otherwise the `page_content`
in the document of the final search result will be lost
2024-07-22 09:50:22 -04:00
남광우
7ab82eb8cc
langchain: Copy libs/standard-tests folder when building devcontainer (#24470)
### Description

* Fix `libs/langchain/dev.Dockerfile` file. copy the
`libs/standard-tests` folder when building the devcontainer.
* `poetry install --no-interaction --no-ansi --with dev,test,docs`
command requires this folder, but it was not copied.

### Reference

#### Error message when building the devcontainer from the master branch

```
...

[2024-07-20T14:27:34.779Z] ------
 > [langchain langchain-dev-dependencies 7/7] RUN poetry install --no-interaction --no-ansi --with dev,test,docs:
0.409 
0.409 Directory ../standard-tests does not exist
------

...
```

#### After the fix

Build success at vscode:

<img width="866" alt="image"
src="https://github.com/user-attachments/assets/10db1b50-6fcf-4dfe-83e1-d93c96aa2317">
2024-07-22 13:46:38 +00:00
rbrugaro
37b89fb7fc
fix RAG with quantized embeddings notebook (#24422)
1. Fix HuggingfacePipeline import error to newer partner package
 2. Switch to IPEXModelForCausalLM for performance

There are no dependency changes since optimum intel is also needed for
QuantizedBiEncoderEmbeddings

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-22 13:44:03 +00:00
Thomas Meike
40c02cedaf
langchain[patch]: add async methods to ConversationSummaryBufferMemory (#20956)
Added asynchronously callable methods according to the
ConversationSummaryBufferMemory API documentation.

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-22 09:21:43 -04:00
Steve Sharp
cecd875cdc
docs: Update streaming.ipynb (typo fix) (#24483)
**Description:** Fixes typo `Le'ts` -> `Let's`.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-07-22 11:09:13 +00:00
Sheng Han Lim
0c6a3fdd6b
langchain: Update ContextualCompressionRetriever base_retriever type to RetrieverLike (#24192)
**Description:**
When initializing retrievers with `configurable_fields` as base
retriever, `ContextualCompressionRetriever` validation fails with the
following error:

```
ValidationError: 1 validation error for ContextualCompressionRetriever
base_retriever
  Can't instantiate abstract class BaseRetriever with abstract method _get_relevant_documents (type=type_error)
```

Example code:

```python
esearch_retriever = VertexAISearchRetriever(
    project_id=GCP_PROJECT_ID,
    location_id="global",
    data_store_id=SEARCH_ENGINE_ID,
).configurable_fields(
    filter=ConfigurableField(id="vertex_search_filter", name="Vertex Search Filter")
)

# rerank documents with Vertex AI Rank API
reranker = VertexAIRank(
    project_id=GCP_PROJECT_ID,
    location_id=GCP_REGION,
    ranking_config="default_ranking_config",
)

retriever_with_reranker = ContextualCompressionRetriever(
    base_compressor=reranker, base_retriever=esearch_retriever
)
```

It seems like the issue stems from ContextualCompressionRetriever
insisting that base retrievers must be strictly `BaseRetriever`
inherited, and doesn't take into account cases where retrievers need to
be chained and can have configurable fields defined.


0a1e475a30/libs/langchain/langchain/retrievers/contextual_compression.py (L15-L22)

This PR proposes that the base_retriever type be set to `RetrieverLike`,
similar to how `EnsembleRetriever` validates its list of retrievers:


0a1e475a30/libs/langchain/langchain/retrievers/ensemble.py (L58-L75)
2024-07-21 14:23:19 -04:00
clement.l
d98b830e4b
community: add flag to toggle progress bar (#24463)
- **Description:** Add a flag to determine whether to show progress bar 
- **Issue:** n/a
- **Dependencies:** n/a
- **Twitter handle:** n/a

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-07-20 13:18:02 +00:00
chuanbei888
6b08a33fa4
community: fix QianfanChatEndpoint default model (#24464)
the baidu_qianfan_endpoint has been changed from ERNIE-Bot-turbo to
ERNIE-Lite-8K
2024-07-20 13:00:29 +00:00
Nuno Campos
947628311b
core[patch]: Accept configurable keys top-level (#23806)
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-07-20 03:49:00 +00:00
Jesus Martinez
c1d1fc13c2
langchain[patch]: Remove multiagent return_direct validation (#24419)
**Description:**

When you use Agents with multi-input tool and some of these tools have
`return_direct=True`, langchain thrown an error related to one
validator.
This change is implemented on [JS
community](https://github.com/langchain-ai/langchainjs/pull/4643) as
well

**Issue**:
This MR resolves #19843

**Dependencies:**

None

Co-authored-by: Jesus Martinez <jesusabraham.martinez@tyson.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
2024-07-20 03:27:43 +00:00
Will Badart
74e3d796f1
core[patch]: ensure iterator_ in scope for _atransform_stream_with_config except (#24454)
Before, if an exception was raised in the outer `try` block in
`Runnable._atransform_stream_with_config` before `iterator_` is
assigned, the corresponding `finally` block would blow up with an
`UnboundLocalError`:

```txt
UnboundLocalError: cannot access local variable 'iterator_' where it is not associated with a value
```

By assigning an initial value to `iterator_` before entering the `try`
block, this commit ensures that the `finally` can run, and not bury the
"true" exception under a "During handling of the above exception [...]"
traceback.

Thanks for your consideration!
2024-07-20 03:24:04 +00:00
maang-h
7b28359719
docs: Add ChatSparkLLM docstrings (#24449)
- **Description:** 
  - Add `ChatSparkLLM` docstrings, the issue #22296 
  - To support `stream` method
2024-07-19 20:19:14 -07:00
Eugene Yurtsev
5e48f35fba
core[minor]: Relax constraints on type checking for tools and parsers (#24459)
This will allow tools and parsers to accept pydantic models from any of
the
following namespaces:

* pydantic.BaseModel with pydantic 1
* pydantic.BaseModel with pydantic 2
* pydantic.v1.BaseModel with pydantic 2
2024-07-19 21:47:34 -04:00
Isaac Francisco
838464de25
ollama: init package (#23615)
Co-authored-by: Erick Friis <erick@langchain.dev>
2024-07-20 00:43:29 +00:00
Erick Friis
f4ee3c8a22
infra: add min version testing to pr test flow (#24358)
xfailing some sql tests that do not currently work on sqlalchemy v1

#22207 was very much not sqlalchemy v1 compatible. 

Moving forward, implementations should be compatible with both to pass
CI
2024-07-19 22:03:19 +00:00
Erick Friis
50cb0a03bc
docs: advanced feature note (#24456)
fixes #24430
2024-07-19 20:05:59 +00:00
Bagatur
842065a9cc
community[patch]: Release 0.2.9 (#24453) 2024-07-19 12:50:22 -07:00
Bagatur
27ad6a4bb3
langchain[patch]: Release 0.2.10 (#24452) 2024-07-19 12:50:13 -07:00
Bagatur
dda9438e87
community[patch]: gpt-4o-mini costs (#24421) 2024-07-19 19:02:44 +00:00
Eugene Yurtsev
604dfe2d99
community[patch]: Force opt-in for WebResearchRetriever (CVE-2024-3095) (#24451)
This PR addresses the issue raised by (CVE-2024-3095)

https://huntr.com/bounties/e62d4895-2901-405b-9559-38276b6a5273

Unfortunately, we didn't do a good job writing the initial report. It's
pointing at both the wrong package and the wrong code.

The affected code is the Web Retriever not the AsyncHTMLLoader, and the
WebRetriever lives in langchain-community

The vulnerable code lives here: 

0bd3f4e129/libs/community/langchain_community/retrievers/web_research.py (L233-L233)


This PR adds a forced opt-in for users to make sure they are aware of
the risk and can mitigate by configuring a proxy:


0bd3f4e129/libs/community/langchain_community/retrievers/web_research.py (L84-L84)
2024-07-19 18:51:35 +00:00
Bagatur
f101c759ed
docs: how to pass runtime secrets (#24450) 2024-07-19 18:36:28 +00:00