langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-10 01:10:59 +00:00

History

Jordy Jackson Antunes da Rocha a50eabbd48 experimental: LLMGraphTransformer add missing conditional adding restrictions to prompts for LLM that do not support function calling (#22793 ) - Description: Modified the prompt created by the function `create_unstructured_prompt` (which is called for LLMs that do not support function calling) by adding conditional checks that verify if restrictions on entity types and rel_types should be added to the prompt. If the user provides a sufficiently large text, the current prompt may fail to produce results in some LLMs. I have first seen this issue when I implemented a custom LLM class that did not support Function Calling and used Gemini 1.5 Pro, but I was able to replicate this issue using OpenAI models. By loading a sufficiently large text ```python from langchain_community.llms import Ollama from langchain_openai import ChatOpenAI, OpenAI from langchain_core.prompts import PromptTemplate import re from langchain_experimental.graph_transformers import LLMGraphTransformer from langchain_core.documents import Document with open("texto-longo.txt", "r") as file: full_text = file.read() partial_text = full_text[:4000] documents = [Document(page_content=partial_text)] # cropped to fit GPT 3.5 context window ``` And using the chat class (that has function calling) ```python chat_openai = ChatOpenAI(model="gpt-3.5-turbo", model_kwargs={"seed": 42}) chat_gpt35_transformer = LLMGraphTransformer(llm=chat_openai) graph_from_chat_gpt35 = chat_gpt35_transformer.convert_to_graph_documents(documents) ``` It works: ``` >>> print(graph_from_chat_gpt35[0].nodes) [Node(id="Jesu, Joy of Man's Desiring", type='Music'), Node(id='Godel', type='Person'), Node(id='Johann Sebastian Bach', type='Person'), Node(id='clever way of encoding the complicated expressions as numbers', type='Concept')] ``` But if you try to use the non-chat LLM class (that does not support function calling) ```python openai = OpenAI( model="gpt-3.5-turbo-instruct", max_tokens=1000, ) gpt35_transformer = LLMGraphTransformer(llm=openai) graph_from_gpt35 = gpt35_transformer.convert_to_graph_documents(documents) ``` It uses the prompt that has issues and sometimes does not produce any result ``` >>> print(graph_from_gpt35[0].nodes) [] ``` After implementing the changes, I was able to use both classes more consistently: ```shell >>> chat_gpt35_transformer = LLMGraphTransformer(llm=chat_openai) >>> graph_from_chat_gpt35 = chat_gpt35_transformer.convert_to_graph_documents(documents) >>> print(graph_from_chat_gpt35[0].nodes) [Node(id="Jesu, Joy Of Man'S Desiring", type='Music'), Node(id='Johann Sebastian Bach', type='Person'), Node(id='Godel', type='Person')] >>> gpt35_transformer = LLMGraphTransformer(llm=openai) >>> graph_from_gpt35 = gpt35_transformer.convert_to_graph_documents(documents) >>> print(graph_from_gpt35[0].nodes) [Node(id='I', type='Pronoun'), Node(id="JESU, JOY OF MAN'S DESIRING", type='Song'), Node(id='larger memory', type='Memory'), Node(id='this nice tree structure', type='Structure'), Node(id='how you can do it all with the numbers', type='Process'), Node(id='JOHANN SEBASTIAN BACH', type='Composer'), Node(id='type of structure', type='Characteristic'), Node(id='that', type='Pronoun'), Node(id='we', type='Pronoun'), Node(id='worry', type='Verb')] ``` The results are a little inconsistent because the GPT 3.5 model may produce incomplete json due to the token limit, but that could be solved (or mitigated) by checking for a complete json when parsing it.		2024-07-01 17:33:51 +00:00
..
agents	experimental[patch]/docs[patch]: Update links to security docs (#22864 )	2024-06-13 20:29:34 +00:00
autonomous_agents	experimental[patch]: return from HuggingGPT task executor task.run() exception (#20219 )	2024-04-25 20:16:39 +00:00
chat_models	infra: rm unused # noqa violations (#22049 )	2024-05-22 15:21:08 -07:00
comprehend_moderation
cpal	Fix: lint errors and update Field alias in models.py and AutoSelectionScorer initialization (#22846 )	2024-06-13 18:18:00 -07:00
data_anonymizer
fallacy_removal	experimental[patch]: `prompts` import fix (#20534 )	2024-04-18 16:09:11 -04:00
generative_agents	patch: deprecate (a)get_relevant_documents (#20477 )	2024-04-22 11:14:53 -04:00
graph_transformers	experimental: LLMGraphTransformer add missing conditional adding restrictions to prompts for LLM that do not support function calling (#22793 )	2024-07-01 17:33:51 +00:00
llm_bash	infra: rm unused # noqa violations (#22049 )	2024-05-22 15:21:08 -07:00
llm_symbolic_math	experimental[patch]: `prompts` import fix (#20534 )	2024-04-18 16:09:11 -04:00
llms	core[minor]: BaseChatModel with_structured_output implementation (#22859 )	2024-06-21 08:14:03 -07:00
open_clip
openai_assistant
pal_chain	community[major], experimental[patch]: Remove Python REPL from community (#22904 )	2024-06-14 17:53:29 +00:00
plan_and_execute	experimental[patch]: `prompts` import fix (#20534 )	2024-04-18 16:09:11 -04:00
prompt_injection_identifier	experimental[minor]: upgrade the prompt injection model (#20783 )	2024-04-23 10:23:39 -04:00
prompts	experimental[patch]: `prompts` import fix (#20534 )	2024-04-18 16:09:11 -04:00
pydantic_v1
recommenders	infra: rm unused # noqa violations (#22049 )	2024-05-22 15:21:08 -07:00
retrievers
rl_chain	infra: rm unused # noqa violations (#22049 )	2024-05-22 15:21:08 -07:00
smart_llm	experimental[patch]: `prompts` import fix (#20534 )	2024-04-18 16:09:11 -04:00
sql	experimental[patch], docs: refine notebook for MyScale `SelfQueryRetriever` (#22016 )	2024-05-22 21:49:01 +00:00
synthetic_data	experimental[patch]: `prompts` import fix (#20534 )	2024-04-18 16:09:11 -04:00
tabular_synthetic_data	experimental[patch]: `prompts` import fix (#20534 )	2024-04-18 16:09:11 -04:00
tools
tot	experimental[patch]: `prompts` import fix (#20534 )	2024-04-18 16:09:11 -04:00
utilities	experimental: clean python repl input（experimental：Added code for PythonREPL） (#20930 )	2024-05-01 05:19:09 +00:00
video_captioning
__init__.py
py.typed
text_splitter.py	SemanticChunker : Feature Addition ("Semantic Splitting with gradient") (#22895 )	2024-06-17 21:01:08 -07:00