adding new chain for logical fallacy removal from model output in chain (#9887)

Description: new chain for logical fallacy removal from model output in
chain and docs
Issue: n/a see above
Dependencies: none
Tag maintainer: @hinthornw in past from my end but not sure who that
would be for maintenance of chains
Twitter handle: no twitter feel free to call out my git user if shout
out j-space-b

Note: created documentation in docs/extras

---------

Co-authored-by: Jon Bennion <jb@Jons-MacBook-Pro.local>
Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>
pull/9791/head
Jon Bennion 10 months ago committed by GitHub
parent 794ff2dae8
commit fed137a8a9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -4,4 +4,5 @@ One of the key concerns with using LLMs is that they may generate harmful or une
- [Moderation chain](/docs/guides/safety/moderation): Explicitly check if any output text is harmful and flag it.
- [Constitutional chain](/docs/guides/safety/constitutional_chain): Prompt the model with a set of principles which should guide it's behavior.
- [Logical Fallacy chain](/docs/guides/safety/logical_fallacy_chain): Checks the model output against logical fallacies to correct any deviation.
- [Amazon Comprehend moderation chain](/docs/guides/safety/amazon_comprehend_chain): Use [Amazon Comprehend](https://aws.amazon.com/comprehend/) to detect and handle PII and toxicity.

@ -0,0 +1,85 @@
# Removing logical fallacies from model output
Logical fallacies are flawed reasoning or false arguments that can undermine the validity of a model's outputs. Examples include circular reasoning, false
dichotomies, ad hominem attacks, etc. Machine learning models are optimized to perform well on specific metrics like accuracy, perplexity, or loss. However,
optimizing for metrics alone does not guarantee logically sound reasoning.
Language models can learn to exploit flaws in reasoning to generate plausible-sounding but logically invalid arguments. When models rely on fallacies, their outputs become unreliable and untrustworthy, even if they achieve high scores on metrics. Users cannot depend on such outputs. Propagating logical fallacies can spread misinformation, confuse users, and lead to harmful real-world consequences when models are deployed in products or services.
Monitoring and testing specifically for logical flaws is challenging unlike other quality issues. It requires reasoning about arguments rather than pattern matching.
Therefore, it is crucial that model developers proactively address logical fallacies after optimizing metrics. Specialized techniques like causal modeling, robustness testing, and bias mitigation can help avoid flawed reasoning. Overall, allowing logical flaws to persist makes models less safe and ethical. Eliminating fallacies ensures model outputs remain logically valid and aligned with human reasoning. This maintains user trust and mitigates risks.
```python
# Imports
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from langchain_experimental.fallacy_removal.base import FallacyChain
```
```python
# Example of a model output being returned with a logical fallacy
misleading_prompt = PromptTemplate(
template="""You have to respond by using only logical fallacies inherent in your answer explanations.
Question: {question}
Bad answer:""",
input_variables=["question"],
)
llm = OpenAI(temperature=0)
misleading_chain = LLMChain(llm=llm, prompt=misleading_prompt)
misleading_chain.run(question="How do I know the earth is round?")
```
<CodeOutputBlock lang="python">
```
'The earth is round because my professor said it is, and everyone believes my professor'
```
</CodeOutputBlock>
```python
fallacies = FallacyChain.get_fallacies(["correction"])
fallacy_chain = FallacyChain.from_llm(
chain=misleading_chain,
logical_fallacies=fallacies,
llm=llm,
verbose=True,
)
fallacy_chain.run(question="How do I know the earth is round?")
```
<CodeOutputBlock lang="python">
```
> Entering new FallacyChain chain...
Initial response: The earth is round because my professor said it is, and everyone believes my professor.
Applying correction...
Fallacy Critique: The model's response uses an appeal to authority and ad populum (everyone believes the professor). Fallacy Critique Needed.
Updated response: You can find evidence of a round earth due to empirical evidence like photos from space, observations of ships disappearing over the horizon, seeing the curved shadow on the moon, or the ability to circumnavigate the globe.
> Finished chain.
'You can find evidence of a round earth due to empirical evidence like photos from space, observations of ships disappearing over the horizon, seeing the curved shadow on the moon, or the ability to circumnavigate the globe.'
```
</CodeOutputBlock>

@ -0,0 +1,4 @@
"""The Chain runs a self-review of logical fallacies as determined by this paper \
categorizing and defining logical fallacies https://arxiv.org/pdf/2212.07425.pdf. \
Modeled after Constitutional AI and in same format, but applying logical \
fallacies as generalized rules to remove in output"""

@ -0,0 +1,181 @@
"""Chain for applying removals of logical fallacies."""
from __future__ import annotations
from typing import Any, Dict, List, Optional
from langchain.callbacks.manager import CallbackManagerForChainRun
from langchain.chains.base import Chain
from langchain.chains.llm import LLMChain
from langchain.schema import BasePromptTemplate
from langchain.schema.language_model import BaseLanguageModel
from langchain_experimental.fallacy_removal.fallacies import FALLACIES
from langchain_experimental.fallacy_removal.models import LogicalFallacy
from langchain_experimental.fallacy_removal.prompts import (
FALLACY_CRITIQUE_PROMPT,
FALLACY_REVISION_PROMPT,
)
class FallacyChain(Chain):
"""Chain for applying logical fallacy evaluations, modeled after Constitutional AI \
and in same format, but applying logical fallacies as generalized rules to remove \
in output
Example:
.. code-block:: python
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain_experimental.fallacy import FallacyChain
from langchain_experimental.fallacy_removal.models import LogicalFallacy
llm = OpenAI()
qa_prompt = PromptTemplate(
template="Q: {question} A:",
input_variables=["question"],
)
qa_chain = LLMChain(llm=llm, prompt=qa_prompt)
fallacy_chain = FallacyChain.from_llm(
llm=llm,
chain=qa_chain,
logical_fallacies=[
LogicalFallacy(
fallacy_critique_request="Tell if this answer meets criteria.",
fallacy_revision_request=\
"Give an answer that meets better criteria.",
)
],
)
fallacy_chain.run(question="How do I know if the earth is round?")
"""
chain: LLMChain
logical_fallacies: List[LogicalFallacy]
fallacy_critique_chain: LLMChain
fallacy_revision_chain: LLMChain
return_intermediate_steps: bool = False
@classmethod
def get_fallacies(cls, names: Optional[List[str]] = None) -> List[LogicalFallacy]:
if names is None:
return list(FALLACIES.values())
else:
return [FALLACIES[name] for name in names]
@classmethod
def from_llm(
cls,
llm: BaseLanguageModel,
chain: LLMChain,
fallacy_critique_prompt: BasePromptTemplate = FALLACY_CRITIQUE_PROMPT,
fallacy_revision_prompt: BasePromptTemplate = FALLACY_REVISION_PROMPT,
**kwargs: Any,
) -> "FallacyChain":
"""Create a chain from an LLM."""
fallacy_critique_chain = LLMChain(llm=llm, prompt=fallacy_critique_prompt)
fallacy_revision_chain = LLMChain(llm=llm, prompt=fallacy_revision_prompt)
return cls(
chain=chain,
fallacy_critique_chain=fallacy_critique_chain,
fallacy_revision_chain=fallacy_revision_chain,
**kwargs,
)
@property
def input_keys(self) -> List[str]:
"""Input keys."""
return self.chain.input_keys
@property
def output_keys(self) -> List[str]:
"""Output keys."""
if self.return_intermediate_steps:
return ["output", "fallacy_critiques_and_revisions", "initial_output"]
return ["output"]
def _call(
self,
inputs: Dict[str, Any],
run_manager: Optional[CallbackManagerForChainRun] = None,
) -> Dict[str, Any]:
_run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()
response = self.chain.run(
**inputs,
callbacks=_run_manager.get_child("original"),
)
initial_response = response
input_prompt = self.chain.prompt.format(**inputs)
_run_manager.on_text(
text="Initial response: " + response + "\n\n",
verbose=self.verbose,
color="yellow",
)
fallacy_critiques_and_revisions = []
for logical_fallacy in self.logical_fallacies:
# Fallacy critique below
fallacy_raw_critique = self.fallacy_critique_chain.run(
input_prompt=input_prompt,
output_from_model=response,
fallacy_critique_request=logical_fallacy.fallacy_critique_request,
callbacks=_run_manager.get_child("fallacy_critique"),
)
fallacy_critique = self._parse_critique(
output_string=fallacy_raw_critique,
).strip()
# if fallacy critique contains "No fallacy critique needed" then done
if "no fallacy critique needed" in fallacy_critique.lower():
fallacy_critiques_and_revisions.append((fallacy_critique, ""))
continue
fallacy_revision = self.fallacy_revision_chain.run(
input_prompt=input_prompt,
output_from_model=response,
fallacy_critique_request=logical_fallacy.fallacy_critique_request,
fallacy_critique=fallacy_critique,
revision_request=logical_fallacy.fallacy_revision_request,
callbacks=_run_manager.get_child("fallacy_revision"),
).strip()
response = fallacy_revision
fallacy_critiques_and_revisions.append((fallacy_critique, fallacy_revision))
_run_manager.on_text(
text=f"Applying {logical_fallacy.name}..." + "\n\n",
verbose=self.verbose,
color="green",
)
_run_manager.on_text(
text="Logical Fallacy: " + fallacy_critique + "\n\n",
verbose=self.verbose,
color="blue",
)
_run_manager.on_text(
text="Updated response: " + fallacy_revision + "\n\n",
verbose=self.verbose,
color="yellow",
)
final_output: Dict[str, Any] = {"output": response}
if self.return_intermediate_steps:
final_output["initial_output"] = initial_response
final_output[
"fallacy_critiques_and_revisions"
] = fallacy_critiques_and_revisions
return final_output
@staticmethod
def _parse_critique(output_string: str) -> str:
if "Fallacy Revision request:" not in output_string:
return output_string
output_string = output_string.split("Fallacy Revision request:")[0]
if "\n\n" in output_string:
output_string = output_string.split("\n\n")[0]
return output_string

@ -0,0 +1,209 @@
"""Classification of logical fallacies in Natural Language Arguments \
from https://arxiv.org/pdf/2212.07425.pdf
"""
from typing import Dict
from langchain_experimental.fallacy_removal.models import LogicalFallacy
FALLACIES: Dict[str, LogicalFallacy] = {
"adhominem": LogicalFallacy(
name="adhominem",
fallacy_critique_request="Identify any feasible ways in which \
the assistants last response is attacking the character or \
personal traits of the person making an argument rather than \
addressing the actual argument and evidence.",
fallacy_revision_request="Please rewrite the assistant response\
to remove any attacking the character or personal traits of the\
person making an argument rather than addressing the actual\
argument and evidence.",
),
"adpopulum": LogicalFallacy(
name="adpopulum",
fallacy_critique_request="Identify ways in which the assistants\
last response may be asserting that something must be true or \
correct simply because many people believe it or do it, without \
actual facts or evidence to support the conclusion.",
fallacy_revision_request="Please rewrite the assistant response \
to remove any assertion that something must be true or correct \
simply because many people believe it or do it, without actual \
facts or evidence to support the conclusion.",
),
"appealtoemotion": LogicalFallacy(
name="appealtoemotion",
fallacy_critique_request="Identify all ways in which the \
assistants last response is an attempt to win support for an \
argument by exploiting or manipulating people's emotions rather \
than using facts and reason.",
fallacy_revision_request="Please rewrite the assistant response \
to remove any attempt to win support for an argument by \
exploiting or manipulating people's emotions rather than using \
facts and reason.",
),
"fallacyofextension": LogicalFallacy(
name="fallacyofextension",
fallacy_critique_request="Identify any ways in which the \
assitant's last response is making broad, sweeping generalizations\
and extending the implications of an argument far beyond what the \
initial premises support.",
fallacy_revision_request="Rewrite the assistant response to remove\
all broad, sweeping generalizations and extending the implications\
of an argument far beyond what the initial premises support.",
),
"intentionalfallacy": LogicalFallacy(
name="intentionalfallacy",
fallacy_critique_request="Identify any way in which the assistants\
last response may be falsely supporting a conclusion by claiming to\
understand an author or creator's subconscious intentions without \
clear evidence.",
fallacy_revision_request="Revise the assistants last response to \
remove any false support of a conclusion by claiming to understand\
an author or creator's subconscious intentions without clear \
evidence.",
),
"falsecausality": LogicalFallacy(
name="falsecausality",
fallacy_critique_request="Think carefully about whether the \
assistant's last response is jumping to conclusions about causation\
between events or circumstances without adequate evidence to infer \
a causal relationship.",
fallacy_revision_request="Please write a new version of the \
assistants response that removes jumping to conclusions about\
causation between events or circumstances without adequate \
evidence to infer a causal relationship.",
),
"falsedilemma": LogicalFallacy(
name="falsedilemma",
fallacy_critique_request="Identify any way in which the \
assistant's last response may be presenting only two possible options\
or sides to a situation when there are clearly other alternatives \
that have not been considered or addressed.",
fallacy_revision_request="Amend the assistants last response to \
remove any presentation of only two possible options or sides to a \
situation when there are clearly other alternatives that have not \
been considered or addressed.",
),
"hastygeneralization": LogicalFallacy(
name="hastygeneralization",
fallacy_critique_request="Identify any way in which the assistants\
last response is making a broad inference or generalization to \
situations, people, or circumstances that are not sufficiently \
similar based on a specific example or limited evidence.",
fallacy_revision_request="Please rewrite the assistant response to\
remove a broad inference or generalization to situations, people, \
or circumstances that are not sufficiently similar based on a \
specific example or limited evidence.",
),
"illogicalarrangement": LogicalFallacy(
name="illogicalarrangement",
fallacy_critique_request="Think carefully about any ways in which \
the assistant's last response is constructing an argument in a \
flawed, illogical way, so the premises do not connect to or lead\
to the conclusion properly.",
fallacy_revision_request="Please rewrite the assistants response\
so as to remove any construction of an argument that is flawed and\
illogical or if the premises do not connect to or lead to the \
conclusion properly.",
),
"fallacyofcredibility": LogicalFallacy(
name="fallacyofcredibility",
fallacy_critique_request="Discuss whether the assistant's last \
response was dismissing or attacking the credibility of the person\
making an argument rather than directly addressing the argument \
itself.",
fallacy_revision_request="Revise the assistants response so as \
that it refrains from dismissing or attacking the credibility of\
the person making an argument rather than directly addressing \
the argument itself.",
),
"circularreasoning": LogicalFallacy(
name="circularreasoning",
fallacy_critique_request="Discuss ways in which the assistants\
last response may be supporting a premise by simply repeating \
the premise as the conclusion without giving actual proof or \
evidence.",
fallacy_revision_request="Revise the assistants response if \
possible so that its not supporting a premise by simply \
repeating the premise as the conclusion without giving actual\
proof or evidence.",
),
"beggingthequestion": LogicalFallacy(
name="beggingthequestion",
fallacy_critique_request="Discuss ways in which the assistant's\
last response is restating the conclusion of an argument as a \
premise without providing actual support for the conclusion in \
the first place.",
fallacy_revision_request="Write a revision of the assistants \
response that refrains from restating the conclusion of an \
argument as a premise without providing actual support for the \
conclusion in the first place.",
),
"trickquestion": LogicalFallacy(
name="trickquestion",
fallacy_critique_request="Identify ways in which the \
assistants last response is asking a question that \
contains or assumes information that has not been proven or \
substantiated.",
fallacy_revision_request="Please write a new assistant \
response so that it does not ask a question that contains \
or assumes information that has not been proven or \
substantiated.",
),
"overapplier": LogicalFallacy(
name="overapplier",
fallacy_critique_request="Identify ways in which the assistants\
last response is applying a general rule or generalization to a \
specific case it was not meant to apply to.",
fallacy_revision_request="Please write a new response that does\
not apply a general rule or generalization to a specific case \
it was not meant to apply to.",
),
"equivocation": LogicalFallacy(
name="equivocation",
fallacy_critique_request="Read the assistants last response \
carefully and identify if it is using the same word or phrase \
in two different senses or contexts within an argument.",
fallacy_revision_request="Rewrite the assistant response so \
that it does not use the same word or phrase in two different \
senses or contexts within an argument.",
),
"amphiboly": LogicalFallacy(
name="amphiboly",
fallacy_critique_request="Critique the assistants last response\
to see if it is constructing sentences such that the grammar \
or structure is ambiguous, leading to multiple interpretations.",
fallacy_revision_request="Please rewrite the assistant response\
to remove any construction of sentences where the grammar or \
structure is ambiguous or leading to multiple interpretations.",
),
"accent": LogicalFallacy(
name="accent",
fallacy_critique_request="Discuss whether the assitant's response\
is misrepresenting an argument by shifting the emphasis of a word\
or phrase to give it a different meaning than intended.",
fallacy_revision_request="Please rewrite the AI model's response\
so that it is not misrepresenting an argument by shifting the \
emphasis of a word or phrase to give it a different meaning than\
intended.",
),
"composition": LogicalFallacy(
name="composition",
fallacy_critique_request="Discuss whether the assistant's \
response is erroneously inferring that something is true of \
the whole based on the fact that it is true of some part or \
parts.",
fallacy_revision_request="Please rewrite the assitant's response\
so that it is not erroneously inferring that something is true \
of the whole based on the fact that it is true of some part or \
parts.",
),
"division": LogicalFallacy(
name="division",
fallacy_critique_request="Discuss whether the assistant's last \
response is erroneously inferring that something is true of the \
parts based on the fact that it is true of the whole.",
fallacy_revision_request="Please rewrite the assitant's response\
so that it is not erroneously inferring that something is true \
of the parts based on the fact that it is true of the whole.",
),
}

@ -0,0 +1,10 @@
"""Models for the Logical Fallacy Chain"""
from langchain_experimental.pydantic_v1 import BaseModel
class LogicalFallacy(BaseModel):
"""Class for a logical fallacy."""
fallacy_critique_request: str
fallacy_revision_request: str
name: str = "Logical Fallacy"

@ -0,0 +1,135 @@
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate
fallacy_critique_example = PromptTemplate(
template="""Human: {input_prompt}
Model: {output_from_model}
Fallacy Critique Request: {fallacy_critique_request}
Fallacy Critique: {fallacy_critique}""",
input_variables=[
"input_prompt",
"output_from_model",
"fallacy_critique_request",
"fallacy_critique",
],
)
examples = [
{
"input_prompt": "If everyone says the Earth is round, \
how do I know that's correct?",
"output_from_model": "The earth is round because your \
teacher says it is",
"fallacy_critique_request": "Identify specific ways in\
which the models previous response had a logical fallacy. \
Also point out potential logical fallacies in the humans \
questions and responses. Examples of logical fallacies \
include but are not limited to ad hominem, ad populum, \
appeal to emotion and false causality.",
"fallacy_critique": "This statement contains the logical \
fallacy of Ad Verecundiam or Appeal to Authority. It is \
a fallacy because it asserts something to be true purely \
based on the authority of the source making the claim, \
without any actual evidence to support it. Fallacy \
Critique Needed",
"fallacy_revision_request": "Please rewrite the model \
response to remove all logical fallacies, and to \
politely point out any logical fallacies from the \
human.",
"fallacy_revision": "The earth is round based on \
evidence from observations of its curvature from high \
altitudes, photos from space showing its spherical shape, \
circumnavigation, and the fact that we see its rounded \
shadow on the moon during lunar eclipses.",
},
{
"input_prompt": "Should we invest more in our school \
music program? After all, studies show students \
involved in music perform better academically.",
"output_from_model": "I don't think we should invest \
more in the music program. Playing the piccolo won't \
teach someone better math skills.",
"fallacy_critique_request": "Identify specific ways \
in which the models previous response had a logical \
fallacy. Also point out potential logical fallacies \
in the humans questions and responses. Examples of \
logical fallacies include but are not limited to ad \
homimem, ad populum, appeal to emotion and false \
causality.",
"fallacy_critique": "This answer commits the division \
fallacy by rejecting the argument based on assuming \
capabilities true of the parts (playing an instrument \
like piccolo) also apply to the whole \
(the full music program). The answer focuses only on \
part of the music program rather than considering it \
as a whole. Fallacy Critique Needed.",
"fallacy_revision_request": "Please rewrite the model \
response to remove all logical fallacies, and to \
politely point out any logical fallacies from the human.",
"fallacy_revision": "While playing an instrument may \
teach discipline, more evidence is needed on whether \
music education courses improve critical thinking \
skills across subjects before determining if increased \
investment in the whole music program is warranted.",
},
]
FALLACY_CRITIQUE_PROMPT = FewShotPromptTemplate(
example_prompt=fallacy_critique_example,
examples=[
{k: v for k, v in e.items() if k != "fallacy_revision_request"}
for e in examples
],
prefix="Below is a conversation between a human and an \
AI assistant. If there is no material critique of the \
model output, append to the end of the Fallacy Critique: \
'No fallacy critique needed.' If there is material \
critique \
of the model output, append to the end of the Fallacy \
Critique: 'Fallacy Critique needed.'",
suffix="""Human: {input_prompt}
Model: {output_from_model}
Fallacy Critique Request: {fallacy_critique_request}
Fallacy Critique:""",
example_separator="\n === \n",
input_variables=["input_prompt", "output_from_model", "fallacy_critique_request"],
)
FALLACY_REVISION_PROMPT = FewShotPromptTemplate(
example_prompt=fallacy_critique_example,
examples=examples,
prefix="Below is a conversation between a human and \
an AI assistant.",
suffix="""Human: {input_prompt}
Model: {output_from_model}
Fallacy Critique Request: {fallacy_critique_request}
Fallacy Critique: {fallacy_critique}
If the fallacy critique does not identify anything worth \
changing, ignore the Fallacy Revision Request and do not \
make any revisions. Instead, return "No revisions needed".
If the fallacy critique does identify something worth \
changing, please revise the model response based on the \
Fallacy Revision Request.
Fallacy Revision Request: {fallacy_revision_request}
Fallacy Revision:""",
example_separator="\n === \n",
input_variables=[
"input_prompt",
"output_from_model",
"fallacy_critique_request",
"fallacy_critique",
"fallacy_revision_request",
],
)

@ -0,0 +1,26 @@
"""Unit tests for the Logical Fallacy chain, same format as CAI"""
from langchain_experimental.fallacy_removal.base import FallacyChain
TEXT_ONE = """ This text is bad.\
Fallacy Revision request: Make it great.\
Fallacy Revision:"""
TEXT_TWO = """ This text is bad.\n\n"""
TEXT_THREE = """ This text is bad.\
Fallacy Revision request: Make it great again.\
Fallacy Revision: Better text"""
def test_fallacy_critique_parsing() -> None:
"""Test parsing of critique text."""
for text in [TEXT_ONE, TEXT_TWO, TEXT_THREE]:
fallacy_critique = FallacyChain._parse_critique(text)
assert (
fallacy_critique.strip() == "This text is bad."
), f"Failed on {text} with {fallacy_critique}"
Loading…
Cancel
Save