mirror of https://github.com/hwchase17/langchain
adding new chain for logical fallacy removal from model output in chain (#9887)
Description: new chain for logical fallacy removal from model output in chain and docs Issue: n/a see above Dependencies: none Tag maintainer: @hinthornw in past from my end but not sure who that would be for maintenance of chains Twitter handle: no twitter feel free to call out my git user if shout out j-space-b Note: created documentation in docs/extras --------- Co-authored-by: Jon Bennion <jb@Jons-MacBook-Pro.local> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>pull/9791/head
parent
794ff2dae8
commit
fed137a8a9
@ -0,0 +1,85 @@
|
|||||||
|
# Removing logical fallacies from model output
|
||||||
|
Logical fallacies are flawed reasoning or false arguments that can undermine the validity of a model's outputs. Examples include circular reasoning, false
|
||||||
|
dichotomies, ad hominem attacks, etc. Machine learning models are optimized to perform well on specific metrics like accuracy, perplexity, or loss. However,
|
||||||
|
optimizing for metrics alone does not guarantee logically sound reasoning.
|
||||||
|
|
||||||
|
Language models can learn to exploit flaws in reasoning to generate plausible-sounding but logically invalid arguments. When models rely on fallacies, their outputs become unreliable and untrustworthy, even if they achieve high scores on metrics. Users cannot depend on such outputs. Propagating logical fallacies can spread misinformation, confuse users, and lead to harmful real-world consequences when models are deployed in products or services.
|
||||||
|
|
||||||
|
Monitoring and testing specifically for logical flaws is challenging unlike other quality issues. It requires reasoning about arguments rather than pattern matching.
|
||||||
|
|
||||||
|
Therefore, it is crucial that model developers proactively address logical fallacies after optimizing metrics. Specialized techniques like causal modeling, robustness testing, and bias mitigation can help avoid flawed reasoning. Overall, allowing logical flaws to persist makes models less safe and ethical. Eliminating fallacies ensures model outputs remain logically valid and aligned with human reasoning. This maintains user trust and mitigates risks.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Imports
|
||||||
|
from langchain.llms import OpenAI
|
||||||
|
from langchain.prompts import PromptTemplate
|
||||||
|
from langchain.chains.llm import LLMChain
|
||||||
|
from langchain_experimental.fallacy_removal.base import FallacyChain
|
||||||
|
```
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Example of a model output being returned with a logical fallacy
|
||||||
|
misleading_prompt = PromptTemplate(
|
||||||
|
template="""You have to respond by using only logical fallacies inherent in your answer explanations.
|
||||||
|
|
||||||
|
Question: {question}
|
||||||
|
|
||||||
|
Bad answer:""",
|
||||||
|
input_variables=["question"],
|
||||||
|
)
|
||||||
|
|
||||||
|
llm = OpenAI(temperature=0)
|
||||||
|
|
||||||
|
misleading_chain = LLMChain(llm=llm, prompt=misleading_prompt)
|
||||||
|
|
||||||
|
misleading_chain.run(question="How do I know the earth is round?")
|
||||||
|
```
|
||||||
|
|
||||||
|
<CodeOutputBlock lang="python">
|
||||||
|
|
||||||
|
```
|
||||||
|
'The earth is round because my professor said it is, and everyone believes my professor'
|
||||||
|
```
|
||||||
|
|
||||||
|
</CodeOutputBlock>
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
fallacies = FallacyChain.get_fallacies(["correction"])
|
||||||
|
fallacy_chain = FallacyChain.from_llm(
|
||||||
|
chain=misleading_chain,
|
||||||
|
logical_fallacies=fallacies,
|
||||||
|
llm=llm,
|
||||||
|
verbose=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
fallacy_chain.run(question="How do I know the earth is round?")
|
||||||
|
```
|
||||||
|
|
||||||
|
<CodeOutputBlock lang="python">
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
> Entering new FallacyChain chain...
|
||||||
|
Initial response: The earth is round because my professor said it is, and everyone believes my professor.
|
||||||
|
|
||||||
|
Applying correction...
|
||||||
|
|
||||||
|
Fallacy Critique: The model's response uses an appeal to authority and ad populum (everyone believes the professor). Fallacy Critique Needed.
|
||||||
|
|
||||||
|
Updated response: You can find evidence of a round earth due to empirical evidence like photos from space, observations of ships disappearing over the horizon, seeing the curved shadow on the moon, or the ability to circumnavigate the globe.
|
||||||
|
|
||||||
|
|
||||||
|
> Finished chain.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
'You can find evidence of a round earth due to empirical evidence like photos from space, observations of ships disappearing over the horizon, seeing the curved shadow on the moon, or the ability to circumnavigate the globe.'
|
||||||
|
```
|
||||||
|
|
||||||
|
</CodeOutputBlock>
|
@ -0,0 +1,4 @@
|
|||||||
|
"""The Chain runs a self-review of logical fallacies as determined by this paper \
|
||||||
|
categorizing and defining logical fallacies https://arxiv.org/pdf/2212.07425.pdf. \
|
||||||
|
Modeled after Constitutional AI and in same format, but applying logical \
|
||||||
|
fallacies as generalized rules to remove in output"""
|
@ -0,0 +1,181 @@
|
|||||||
|
"""Chain for applying removals of logical fallacies."""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from typing import Any, Dict, List, Optional
|
||||||
|
|
||||||
|
from langchain.callbacks.manager import CallbackManagerForChainRun
|
||||||
|
from langchain.chains.base import Chain
|
||||||
|
from langchain.chains.llm import LLMChain
|
||||||
|
from langchain.schema import BasePromptTemplate
|
||||||
|
from langchain.schema.language_model import BaseLanguageModel
|
||||||
|
|
||||||
|
from langchain_experimental.fallacy_removal.fallacies import FALLACIES
|
||||||
|
from langchain_experimental.fallacy_removal.models import LogicalFallacy
|
||||||
|
from langchain_experimental.fallacy_removal.prompts import (
|
||||||
|
FALLACY_CRITIQUE_PROMPT,
|
||||||
|
FALLACY_REVISION_PROMPT,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class FallacyChain(Chain):
|
||||||
|
"""Chain for applying logical fallacy evaluations, modeled after Constitutional AI \
|
||||||
|
and in same format, but applying logical fallacies as generalized rules to remove \
|
||||||
|
in output
|
||||||
|
|
||||||
|
Example:
|
||||||
|
.. code-block:: python
|
||||||
|
|
||||||
|
from langchain.llms import OpenAI
|
||||||
|
from langchain.chains import LLMChain
|
||||||
|
from langchain_experimental.fallacy import FallacyChain
|
||||||
|
from langchain_experimental.fallacy_removal.models import LogicalFallacy
|
||||||
|
|
||||||
|
llm = OpenAI()
|
||||||
|
|
||||||
|
qa_prompt = PromptTemplate(
|
||||||
|
template="Q: {question} A:",
|
||||||
|
input_variables=["question"],
|
||||||
|
)
|
||||||
|
qa_chain = LLMChain(llm=llm, prompt=qa_prompt)
|
||||||
|
|
||||||
|
fallacy_chain = FallacyChain.from_llm(
|
||||||
|
llm=llm,
|
||||||
|
chain=qa_chain,
|
||||||
|
logical_fallacies=[
|
||||||
|
LogicalFallacy(
|
||||||
|
fallacy_critique_request="Tell if this answer meets criteria.",
|
||||||
|
fallacy_revision_request=\
|
||||||
|
"Give an answer that meets better criteria.",
|
||||||
|
)
|
||||||
|
],
|
||||||
|
)
|
||||||
|
|
||||||
|
fallacy_chain.run(question="How do I know if the earth is round?")
|
||||||
|
"""
|
||||||
|
|
||||||
|
chain: LLMChain
|
||||||
|
logical_fallacies: List[LogicalFallacy]
|
||||||
|
fallacy_critique_chain: LLMChain
|
||||||
|
fallacy_revision_chain: LLMChain
|
||||||
|
return_intermediate_steps: bool = False
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def get_fallacies(cls, names: Optional[List[str]] = None) -> List[LogicalFallacy]:
|
||||||
|
if names is None:
|
||||||
|
return list(FALLACIES.values())
|
||||||
|
else:
|
||||||
|
return [FALLACIES[name] for name in names]
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def from_llm(
|
||||||
|
cls,
|
||||||
|
llm: BaseLanguageModel,
|
||||||
|
chain: LLMChain,
|
||||||
|
fallacy_critique_prompt: BasePromptTemplate = FALLACY_CRITIQUE_PROMPT,
|
||||||
|
fallacy_revision_prompt: BasePromptTemplate = FALLACY_REVISION_PROMPT,
|
||||||
|
**kwargs: Any,
|
||||||
|
) -> "FallacyChain":
|
||||||
|
"""Create a chain from an LLM."""
|
||||||
|
fallacy_critique_chain = LLMChain(llm=llm, prompt=fallacy_critique_prompt)
|
||||||
|
fallacy_revision_chain = LLMChain(llm=llm, prompt=fallacy_revision_prompt)
|
||||||
|
return cls(
|
||||||
|
chain=chain,
|
||||||
|
fallacy_critique_chain=fallacy_critique_chain,
|
||||||
|
fallacy_revision_chain=fallacy_revision_chain,
|
||||||
|
**kwargs,
|
||||||
|
)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def input_keys(self) -> List[str]:
|
||||||
|
"""Input keys."""
|
||||||
|
return self.chain.input_keys
|
||||||
|
|
||||||
|
@property
|
||||||
|
def output_keys(self) -> List[str]:
|
||||||
|
"""Output keys."""
|
||||||
|
if self.return_intermediate_steps:
|
||||||
|
return ["output", "fallacy_critiques_and_revisions", "initial_output"]
|
||||||
|
return ["output"]
|
||||||
|
|
||||||
|
def _call(
|
||||||
|
self,
|
||||||
|
inputs: Dict[str, Any],
|
||||||
|
run_manager: Optional[CallbackManagerForChainRun] = None,
|
||||||
|
) -> Dict[str, Any]:
|
||||||
|
_run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()
|
||||||
|
response = self.chain.run(
|
||||||
|
**inputs,
|
||||||
|
callbacks=_run_manager.get_child("original"),
|
||||||
|
)
|
||||||
|
initial_response = response
|
||||||
|
input_prompt = self.chain.prompt.format(**inputs)
|
||||||
|
|
||||||
|
_run_manager.on_text(
|
||||||
|
text="Initial response: " + response + "\n\n",
|
||||||
|
verbose=self.verbose,
|
||||||
|
color="yellow",
|
||||||
|
)
|
||||||
|
fallacy_critiques_and_revisions = []
|
||||||
|
for logical_fallacy in self.logical_fallacies:
|
||||||
|
# Fallacy critique below
|
||||||
|
|
||||||
|
fallacy_raw_critique = self.fallacy_critique_chain.run(
|
||||||
|
input_prompt=input_prompt,
|
||||||
|
output_from_model=response,
|
||||||
|
fallacy_critique_request=logical_fallacy.fallacy_critique_request,
|
||||||
|
callbacks=_run_manager.get_child("fallacy_critique"),
|
||||||
|
)
|
||||||
|
fallacy_critique = self._parse_critique(
|
||||||
|
output_string=fallacy_raw_critique,
|
||||||
|
).strip()
|
||||||
|
|
||||||
|
# if fallacy critique contains "No fallacy critique needed" then done
|
||||||
|
if "no fallacy critique needed" in fallacy_critique.lower():
|
||||||
|
fallacy_critiques_and_revisions.append((fallacy_critique, ""))
|
||||||
|
continue
|
||||||
|
|
||||||
|
fallacy_revision = self.fallacy_revision_chain.run(
|
||||||
|
input_prompt=input_prompt,
|
||||||
|
output_from_model=response,
|
||||||
|
fallacy_critique_request=logical_fallacy.fallacy_critique_request,
|
||||||
|
fallacy_critique=fallacy_critique,
|
||||||
|
revision_request=logical_fallacy.fallacy_revision_request,
|
||||||
|
callbacks=_run_manager.get_child("fallacy_revision"),
|
||||||
|
).strip()
|
||||||
|
response = fallacy_revision
|
||||||
|
fallacy_critiques_and_revisions.append((fallacy_critique, fallacy_revision))
|
||||||
|
|
||||||
|
_run_manager.on_text(
|
||||||
|
text=f"Applying {logical_fallacy.name}..." + "\n\n",
|
||||||
|
verbose=self.verbose,
|
||||||
|
color="green",
|
||||||
|
)
|
||||||
|
|
||||||
|
_run_manager.on_text(
|
||||||
|
text="Logical Fallacy: " + fallacy_critique + "\n\n",
|
||||||
|
verbose=self.verbose,
|
||||||
|
color="blue",
|
||||||
|
)
|
||||||
|
|
||||||
|
_run_manager.on_text(
|
||||||
|
text="Updated response: " + fallacy_revision + "\n\n",
|
||||||
|
verbose=self.verbose,
|
||||||
|
color="yellow",
|
||||||
|
)
|
||||||
|
|
||||||
|
final_output: Dict[str, Any] = {"output": response}
|
||||||
|
if self.return_intermediate_steps:
|
||||||
|
final_output["initial_output"] = initial_response
|
||||||
|
final_output[
|
||||||
|
"fallacy_critiques_and_revisions"
|
||||||
|
] = fallacy_critiques_and_revisions
|
||||||
|
return final_output
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _parse_critique(output_string: str) -> str:
|
||||||
|
if "Fallacy Revision request:" not in output_string:
|
||||||
|
return output_string
|
||||||
|
output_string = output_string.split("Fallacy Revision request:")[0]
|
||||||
|
if "\n\n" in output_string:
|
||||||
|
output_string = output_string.split("\n\n")[0]
|
||||||
|
return output_string
|
@ -0,0 +1,10 @@
|
|||||||
|
"""Models for the Logical Fallacy Chain"""
|
||||||
|
from langchain_experimental.pydantic_v1 import BaseModel
|
||||||
|
|
||||||
|
|
||||||
|
class LogicalFallacy(BaseModel):
|
||||||
|
"""Class for a logical fallacy."""
|
||||||
|
|
||||||
|
fallacy_critique_request: str
|
||||||
|
fallacy_revision_request: str
|
||||||
|
name: str = "Logical Fallacy"
|
@ -0,0 +1,26 @@
|
|||||||
|
"""Unit tests for the Logical Fallacy chain, same format as CAI"""
|
||||||
|
from langchain_experimental.fallacy_removal.base import FallacyChain
|
||||||
|
|
||||||
|
TEXT_ONE = """ This text is bad.\
|
||||||
|
|
||||||
|
Fallacy Revision request: Make it great.\
|
||||||
|
|
||||||
|
Fallacy Revision:"""
|
||||||
|
|
||||||
|
TEXT_TWO = """ This text is bad.\n\n"""
|
||||||
|
|
||||||
|
TEXT_THREE = """ This text is bad.\
|
||||||
|
|
||||||
|
Fallacy Revision request: Make it great again.\
|
||||||
|
|
||||||
|
Fallacy Revision: Better text"""
|
||||||
|
|
||||||
|
|
||||||
|
def test_fallacy_critique_parsing() -> None:
|
||||||
|
"""Test parsing of critique text."""
|
||||||
|
for text in [TEXT_ONE, TEXT_TWO, TEXT_THREE]:
|
||||||
|
fallacy_critique = FallacyChain._parse_critique(text)
|
||||||
|
|
||||||
|
assert (
|
||||||
|
fallacy_critique.strip() == "This text is bad."
|
||||||
|
), f"Failed on {text} with {fallacy_critique}"
|
Loading…
Reference in New Issue