Commit Graph

4742 Commits

Author SHA1 Message Date
Jordy Jackson Antunes da Rocha
a50eabbd48
experimental: LLMGraphTransformer add missing conditional adding restrictions to prompts for LLM that do not support function calling (#22793)
- Description: Modified the prompt created by the function
`create_unstructured_prompt` (which is called for LLMs that do not
support function calling) by adding conditional checks that verify if
restrictions on entity types and rel_types should be added to the
prompt. If the user provides a sufficiently large text, the current
prompt **may** fail to produce results in some LLMs. I have first seen
this issue when I implemented a custom LLM class that did not support
Function Calling and used Gemini 1.5 Pro, but I was able to replicate
this issue using OpenAI models.

By loading a sufficiently large text
```python
from langchain_community.llms import Ollama
from langchain_openai import ChatOpenAI, OpenAI
from langchain_core.prompts import PromptTemplate
import re
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_core.documents import Document

with open("texto-longo.txt", "r") as file:
    full_text = file.read()
    partial_text = full_text[:4000]

documents = [Document(page_content=partial_text)] # cropped to fit GPT 3.5 context window
```

And using the chat class (that has function calling)
```python
chat_openai = ChatOpenAI(model="gpt-3.5-turbo", model_kwargs={"seed": 42})
chat_gpt35_transformer = LLMGraphTransformer(llm=chat_openai)
graph_from_chat_gpt35 = chat_gpt35_transformer.convert_to_graph_documents(documents)
```
It works:
```
>>> print(graph_from_chat_gpt35[0].nodes)
[Node(id="Jesu, Joy of Man's Desiring", type='Music'), Node(id='Godel', type='Person'), Node(id='Johann Sebastian Bach', type='Person'), Node(id='clever way of encoding the complicated expressions as numbers', type='Concept')]
```

But if you try to use the non-chat LLM class (that does not support
function calling)
```python
openai = OpenAI(
    model="gpt-3.5-turbo-instruct",
    max_tokens=1000,
)
gpt35_transformer = LLMGraphTransformer(llm=openai)
graph_from_gpt35 = gpt35_transformer.convert_to_graph_documents(documents)
```

It uses the prompt that has issues and sometimes does not produce any
result
```
>>> print(graph_from_gpt35[0].nodes)
[]
```

After implementing the changes, I was able to use both classes more
consistently:

```shell
>>> chat_gpt35_transformer = LLMGraphTransformer(llm=chat_openai)
>>> graph_from_chat_gpt35 = chat_gpt35_transformer.convert_to_graph_documents(documents)
>>> print(graph_from_chat_gpt35[0].nodes)
[Node(id="Jesu, Joy Of Man'S Desiring", type='Music'), Node(id='Johann Sebastian Bach', type='Person'), Node(id='Godel', type='Person')]
>>> gpt35_transformer = LLMGraphTransformer(llm=openai)
>>> graph_from_gpt35 = gpt35_transformer.convert_to_graph_documents(documents)
>>> print(graph_from_gpt35[0].nodes)
[Node(id='I', type='Pronoun'), Node(id="JESU, JOY OF MAN'S DESIRING", type='Song'), Node(id='larger memory', type='Memory'), Node(id='this nice tree structure', type='Structure'), Node(id='how you can do it all with the numbers', type='Process'), Node(id='JOHANN SEBASTIAN BACH', type='Composer'), Node(id='type of structure', type='Characteristic'), Node(id='that', type='Pronoun'), Node(id='we', type='Pronoun'), Node(id='worry', type='Verb')]
```

The results are a little inconsistent because the GPT 3.5 model may
produce incomplete json due to the token limit, but that could be solved
(or mitigated) by checking for a complete json when parsing it.
2024-07-01 17:33:51 +00:00
Eugene Yurtsev
4f1821db3e
core[minor]: Add get_by_ids to vectorstore interface (#23594)
This PR adds a part of the indexing API proposed in this RFC
https://github.com/langchain-ai/langchain/pull/23544/files.

It allows rolling out `get_by_ids` which should be uncontroversial to
existing vectorstores without introducing new abstractions.

The semantics for this method depend on the ability of identifying
returned documents using the new optional ID field on documents:
https://github.com/langchain-ai/langchain/pull/23411

Alternatives are:

1. Relax the sequence requirement

```python
def get_by_ids(self, ids: Iterable[str], /) -> Iterable[Document]:
```

Rejected:
- implementations are more likley to start batching with bad defaults
- users would need to call list() or we'd need to introduce another
convenience method

2. Support more kwargs

```python

def get_by_ids(self, ids: Sequence[str], /, **kwargs) -> List[Document]:
...
```

Rejected: 
- No need for `batch` parameter since IDs is a sequence
- Output cannot be customized since `Document` is fixed. (e.g.,
parameters could be useful to grab extra metadata like the vector that
was indexed with the Document or to project a part of the document)
2024-07-01 13:04:33 -04:00
Valentin
bf402f902e
community: Fix LanceDB similarity search bug (#23591)
**Description:** LanceDB didn't allow querying the database using
similarity score thresholds because the metrics value was missing. This
PR simply fixes that bug.
**Issue:** not applicable
**Dependencies:** none
**Twitter handle:** not available

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-07-01 16:33:45 +00:00
Bagatur
389a568f9a
standard-tests[patch]: add anthropic format integration test (#23717) 2024-07-01 11:06:04 -04:00
Rafael Pereira
4b9517db85
Jira: Allow Jira access using only the token (#23708)
- **Description:** At the moment the Jira wrapper only accepts the the
usage of the Username and Password/Token at the same time. However Jira
allows the connection using only is useful for enterprise context.

Co-authored-by: rpereira <rafael.pereira@criticalsoftware.com>
2024-07-01 13:13:51 +00:00
Tim Van Wassenhove
24916c6703
community: Register pandas df in duckdb when creating vector_store (#23690)
- **Description:** Register pandas df in duckdb when creating
vector_store
- **Issue:** Resolves #23308
- **Dependencies:** None
- **Twitter handle:** @timvw

Co-authored-by: Tim Van Wassenhove <tim.van.wassenhove@telenetgroup.be>
2024-07-01 09:12:06 -04:00
Bagatur
29aa9d6750
groq[patch]: Release 0.1.6 (#23655) 2024-06-29 07:35:23 -04:00
Bagatur
f2d0c13a15
fireworks[patch]: Release 0.1.4 (#23654) 2024-06-29 07:35:16 -04:00
Bagatur
9a5e35d1ba
mistralai[patch]: Release 0.1.9 (#23653) 2024-06-29 07:35:09 -04:00
Mateusz Szewczyk
a78ccb993c
ibm: Add support for Chat Models (#22979) 2024-06-29 01:59:25 -07:00
Bagatur
af2c05e5f3
openai[patch]: Release 0.1.13 (#23651) 2024-06-28 17:10:30 -07:00
Bagatur
b63c7f10bc
anthropic[patch]: Release 0.1.17 (#23650) 2024-06-28 17:07:08 -07:00
Bagatur
fc8fd49328
openai, anthropic, ...: with_structured_output to pass in explicit tool choice (#23645)
...community, mistralai, groq, fireworks

part of #23644
2024-06-28 16:39:53 -07:00
Bagatur
81064017a9
docs: azure openai docstring (#23643)
part of #22296
2024-06-28 15:15:58 -07:00
Bagatur
381aedcc61
docs: standardize azure openai page (#23642)
part of #22296
2024-06-28 15:15:41 -07:00
Vadym Barda
e8d77002ea
core: add RemoveMessage (#23636)
This change adds a new message type `RemoveMessage`. This will enable
`langgraph` users to manually modify graph state (or have the graph
nodes modify the state) to remove messages by `id`

Examples:

* allow users to delete messages from state by calling

```python
graph.update_state(config, values=[RemoveMessage(id=state.values[-1].id)])
```

* allow nodes to delete messages

```python
graph.add_node("delete_messages", lambda state: [RemoveMessage(id=state[-1].id)])
```
2024-06-28 14:40:02 -07:00
ccurme
8fce8c6771
community: fix extended tests (#23640) 2024-06-28 16:35:38 -04:00
ccurme
5d93916665
openai[patch]: release 0.1.12 (#23641) 2024-06-28 19:51:16 +00:00
Jacob Lee
a032583b17
docs[patch]: Update diagrams (#23613) 2024-06-28 12:36:00 -07:00
ccurme
390ee8d971
standard-tests: add test for structured output (#23631)
- add test for structured output
- fix bug with structured output for Azure
- better testing on Groq (break out Mixtral + Llama3 and add xfails
where needed)
2024-06-28 15:01:40 -04:00
j pradhan
5f21eab491
community:perplexity[patch]: standardize init args (#21794)
updated request_timeout default alias value per related docstring.

Related to
[20085](https://github.com/langchain-ai/langchain/issues/20085)

Thank you for contributing to LangChain!

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-06-28 13:26:12 +00:00
mackong
11483b0fb8
community[patch]: set tool name for tongyi&qianfan llm (#22889)
- **Description:** The name of ToolMessage is default to None, which
makes tool message send to LLM likes
 ```json
{"role": "tool",
   "tool_call_id": "",
   "content": "{\"time\": \"12:12\"}",
   "name": null}
```
But the name seems essential for some LLMs like TongYi Qwen. so we need to set the name use agent_action's tool value.
  - **Issue:** N/A
  - **Dependencies:** N/A
2024-06-28 09:17:05 -04:00
Leonid Ganeline
e4caa41aa9
community: docstrings toolkits (#23616)
Added missed docstrings. Formatted docstrings to the consistent form.
2024-06-28 08:40:52 -04:00
ccurme
adf2dc13de
community: fix lint (#23611) 2024-06-27 22:12:16 +00:00
Leonid Ganeline
75a44fe951
core: chat_* docstrings (#23412)
Added missed docstrings. Formatted docstrings to the consistent form.
2024-06-27 17:29:38 -04:00
Bagatur
3b1fcb2a65
chroma[patch]: Release 0.1.2 (#23604) 2024-06-27 13:58:24 -07:00
Eugene Yurtsev
68f348357e
community[patch]: Test InMemoryVectorStore with RWAPI test suite (#23603)
Add standard test suite to InMemoryVectorStore implementation.
2024-06-27 16:43:43 -04:00
Eugene Yurtsev
da7beb1c38
core[patch]: Add unit test when catching generator exit (#23402)
This pr adds a unit test for:
https://github.com/langchain-ai/langchain/pull/22662
And narrows the scope where the exception is caught.
2024-06-27 20:36:07 +00:00
NG Sai Prasanth
5e6d23f27d
community: Standardise tool import for arxiv & semantic scholar (#23578)
- **Description:** Fixing the way users have to import Arxiv and
Semantic Scholar
- **Issue:** Changed to use `from langchain_community.tools.arxiv import
ArxivQueryRun` instead of `from langchain_community.tools.arxiv.tool
import ArxivQueryRun`
    - **Dependencies:** None
    - **Twitter handle:** Nope
2024-06-27 16:35:50 -04:00
ccurme
d04f657424
langchain[patch]: deprecate ConversationChain (#23504)
Would like some feedback on how to best incorporate legacy memory
objects into `RunnableWithMessageHistory`.
2024-06-27 16:32:44 -04:00
Ayo Ayibiowu
c6f700b7cb
fix(community): allow support for disabling max_tokens args (#21534)
This PR fixes an issue with not able to use unlimited/infinity tokens
from the respective provider for the LiteLLM provider.

This is an issue when working in an agent environment that the token
usage can drastically increase beyond the initial value set causing
unexpected behavior.
2024-06-27 16:28:59 -04:00
Leonid Ganeline
c0fdbaac85
langchain: docstrings in agents root (#23561)
Added missed docstrings. Formatted docstrings to the consistent form.
2024-06-27 15:52:18 -04:00
Leonid Ganeline
b64c4b4750
langchain: docstrings agents nested (#23598)
Added missed docstrings. Formatted docstrings to the consistent form.

---------

Co-authored-by: ccurme <chester.curme@gmail.com>
2024-06-27 19:49:41 +00:00
mackong
70834cd741
community[patch]: support convert FunctionMessage for Tongyi (#23569)
**Description:** For function call agent with Tongyi, cause the
AgentAction will be converted to FunctionMessage by

47f69fe0d8/libs/core/langchain_core/agents.py (L188)
But now Tongyi's *convert_message_to_dict* doesn't support
FunctionMessage

47f69fe0d8/libs/community/langchain_community/chat_models/tongyi.py (L184-L207)
Then next round conversation will be failed by the *TypeError*
exception.

This patch adds the support to convert FunctionMessage for Tongyi.

**Issue:** N/A
**Dependencies:** N/A
2024-06-27 15:49:26 -04:00
Bagatur
d45ece0e58
chroma[patch]: loosen py req (#23599)
currently causes issues if you try adding to a project that supports
py<4
2024-06-27 12:40:59 -07:00
Mohammad Mohtashim
4796b7eb15
[Community [HuggingFace]]: Small Fix for ChatHuggingFace. (#22925)
- **Description:** A small fix where I moved the `available_endpoints`
in order to avoid the token error in the below issue. Also I have added
conftest file and updated the `scripy`,`numpy` versions to support newer
python versions in poetry files.
- **Issue:** #22804

---------

Co-authored-by: Erick Friis <erick@langchain.dev>
Co-authored-by: ccurme <chester.curme@gmail.com>
2024-06-27 19:37:20 +00:00
ccurme
bffc3c24a0
openai[patch]: release 0.1.11 (#23596) 2024-06-27 18:48:40 +00:00
ccurme
a1520357c8
openai[patch]: revert addition of "name" to supported properties for tool messages (#23600) 2024-06-27 18:40:04 +00:00
joshc-ai21
16a293cc3a
Small bug fixes (#23353)
Small bug fixes according to your comments

---------

Signed-off-by: Joffref <mariusjoffre@gmail.com>
Signed-off-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Baskar Gopinath <73015364+baskargopinath@users.noreply.github.com>
Co-authored-by: Chester Curme <chester.curme@gmail.com>
Co-authored-by: Mathis Joffre <51022808+Joffref@users.noreply.github.com>
Co-authored-by: Baur <baur.krykpayev@gmail.com>
Co-authored-by: Nuradil <nuradil.maksut@icloud.com>
Co-authored-by: Nuradil <133880216+yaksh0nti@users.noreply.github.com>
Co-authored-by: Jacob Lee <jacoblee93@gmail.com>
Co-authored-by: Rave Harpaz <rave.harpaz@oracle.com>
Co-authored-by: RHARPAZ <RHARPAZ@RHARPAZ-5750.us.oracle.com>
Co-authored-by: Arthur Cheng <arthur.cheng@oracle.com>
Co-authored-by: Tomaz Bratanic <bratanic.tomaz@gmail.com>
Co-authored-by: RUO <61719257+comsa33@users.noreply.github.com>
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
Co-authored-by: Luis Rueda <userlerueda@gmail.com>
Co-authored-by: Jib <Jibzade@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
Co-authored-by: S M Zia Ur Rashid <smziaurrashid@gmail.com>
Co-authored-by: Ikko Eltociear Ashimine <eltociear@gmail.com>
Co-authored-by: yuncliu <lyc1990@qq.com>
Co-authored-by: wenngong <76683249+wenngong@users.noreply.github.com>
Co-authored-by: gongwn1 <gongwn1@lenovo.com>
Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com>
Co-authored-by: Rahul Triptahi <rahul.psit.ec@gmail.com>
Co-authored-by: Rahul Tripathi <rauhl.psit.ec@gmail.com>
Co-authored-by: maang-h <55082429+maang-h@users.noreply.github.com>
Co-authored-by: asafg <asafg@ai21.com>
Co-authored-by: Asaf Joseph Gardin <39553475+Josephasafg@users.noreply.github.com>
2024-06-27 17:58:22 +00:00
ccurme
5536420bee
openai[patch]: add comment (#23595)
Forgot to push this to
https://github.com/langchain-ai/langchain/pull/23551
2024-06-27 16:47:14 +00:00
andrewmjc
9f0f3c7e29
partners[openai]: Add name field to tool message to match OpenAI spec (#23551)
Discovered alongside @t968914

  - **Description:**
According to OpenAI docs, tool messages (response from calling tools)
must have a 'name' field.

https://cookbook.openai.com/examples/how_to_call_functions_with_chat_models

  - **Issue:** N/A (as of right now)
  - **Dependencies:** N/A
  - **Twitter handle:** N/A

Additional guidelines:
- Make sure optional dependencies are imported within a function.
- Please do not add dependencies to pyproject.toml files (even optional
ones) unless they are required for unit tests.
- Most PRs should not touch more than one package.
- Changes should be backwards compatible.
- If you are adding something to community, do not re-import it in
langchain.

If no one reviews your PR within a few days, please @-mention one of
baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17.
2024-06-27 12:42:36 -04:00
Krista Pratico
85e36b0f50
partners[openai]: only add stream_options to kwargs if requested (#23552)
- **Description:** This PR
https://github.com/langchain-ai/langchain/pull/22854 added the ability
to pass `stream_options` through to the openai service to get token
usage information in the response. Currently OpenAI supports this
parameter, but Azure OpenAI does not yet. For users who proxy their
calls to both services through ChatOpenAI, this breaks when targeting
Azure OpenAI (see related discussion opened in openai-python:
https://github.com/openai/openai-python/issues/1469#issuecomment-2192658630).

> Error code: 400 - {'error': {'code': None, 'message': 'Unrecognized
request argument supplied: stream_options', 'param': None, 'type':
'invalid_request_error'}}

This PR fixes the issue by only adding `stream_options` to the request
if it's actually requested by the user (i.e. set to True). If I'm not
mistaken, we have a test case that already covers this scenario:
https://github.com/langchain-ai/langchain/blob/master/libs/partners/openai/tests/integration_tests/chat_models/test_base.py#L398-L399

- **Issue:** Issue opened in openai-python:
https://github.com/openai/openai-python/issues/1469
  - **Dependencies:** N/A
  - **Twitter handle:** N/A

---------

Co-authored-by: Chester Curme <chester.curme@gmail.com>
2024-06-27 12:23:05 -04:00
Eugene Yurtsev
96b72edac8
core[minor]: Add optional ID field to Document schema (#23411)
This PR adds an optional ID field to the document schema.

# 1. Optional or Required

- An optional field will will requrie additional checking for the type
in user code (annoying).
- However, vectorstores currently don't respect this field. So if we
make it
required and start returning random UUIDs that might be even more
confusing
  to users.


**Proposal**: Start with Optional and convert to Required (with default
set to uuid4()) in 1-2 major releases.


# 2. Override __str__ or generic solution in prompts

Overriding __str__ as a simple way to avoid changing user code that
relies on
default str(document) in prompts. 


I considered rolling out a more general solution in prompts
(https://github.com/langchain-ai/langchain/pull/8685),
but to do that we need to:

1. Make things serializable
2. The more general solution would likely need to be backwards
compatible as well
3. It's unclear that one wants to format a List[int] in the same way as
List[Document]. The former should be `,` seperated (likely), the latter
   should be `---` separated (likely).


**Proposal** Start with __str__ override and focus on the vectorstore
APIs, we generalize prompts later
2024-06-27 12:15:58 -04:00
ccurme
5bfcb898ad
openai[patch]: bump sdk version (#23592)
Tests failing with `TypeError: Completions.create() got an unexpected
keyword argument 'parallel_tool_calls'`
2024-06-27 11:57:24 -04:00
Jacob Lee
60fc15a56b
docs[patch]: Update docs introduction and README (#23558)
CC @hwchase17 @baskaryan
2024-06-27 08:51:43 -07:00
mackong
daf733b52e
langchain[minor]: fix comment typo (#23564)
**Description:** fix typo of comment
**Issue:** N/A
**Dependencies:** N/A
2024-06-27 10:09:18 -04:00
Leonid Ganeline
2c9b84c3a8
core[patch]: docstrings agents (#23502)
Added missed docstrings. Formatted docstrings to the consistent form.
2024-06-26 17:50:48 -04:00
Leonid Ganeline
2a5d59b3d7
core[patch]: callbacks docstrings (#23375)
Added missed docstrings. Formatted docstrings to the consistent form.
2024-06-26 17:11:06 -04:00
Leonid Ganeline
1141b08eb8
core: docstrings example_selectors (#23542)
Added missed docstrings. Formatted docstrings to the consistent form.
2024-06-26 17:10:40 -04:00
wenngong
3bf1d98dbf
langchain[patch]: update agent and chains modules root_validators (#23256)
Description: update agent and chains modules Pydantic root_validators.
Issue: the issue #22819

---------

Co-authored-by: gongwn1 <gongwn1@lenovo.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Eugene Yurtsev <eugene@langchain.dev>
2024-06-26 17:09:50 -04:00