## Description
This PR adds support for
[lm-format-enforcer](https://github.com/noamgat/lm-format-enforcer) to
LangChain.
![image](https://raw.githubusercontent.com/noamgat/lm-format-enforcer/main/docs/Intro.webp)
The library is similar to jsonformer / RELLM which are supported in
Langchain, but has several advantages such as
- Batching and Beam search support
- More complete JSON Schema support
- LLM has control over whitespace, improving quality
- Better runtime performance due to only calling the LLM's generate()
function once per generate() call.
The integration is loosely based on the jsonformer integration in terms
of project structure.
## Dependencies
No compile-time dependency was added, but if `lm-format-enforcer` is not
installed, a runtime error will occur if it is trying to be used.
## Tests
Due to the integration modifying the internal parameters of the
underlying huggingface transformer LLM, it is not possible to test
without building a real LM, which requires internet access. So, similar
to the jsonformer and RELLM integrations, the testing is via the
notebook.
## Twitter Handle
[@noamgat](https://twitter.com/noamgat)
Looking forward to hearing feedback!
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Best to review one commit at a time, since two of the commits are 100%
autogenerated changes from running `ruff format`:
- Install and use `ruff format` instead of black for code formatting.
- Output of `ruff format .` in the `langchain` package.
- Use `ruff format` in experimental package.
- Format changes in experimental package by `ruff format`.
- Manual formatting fixes to make `ruff .` pass.
This PR replaces the previous `Intent` check with the new `Prompt
Safety` check. The logic and steps to enable chain moderation via the
Amazon Comprehend service, allowing you to detect and redact PII, Toxic,
and Prompt Safety information in the LLM prompt or answer remains
unchanged.
This implementation updates the code and configuration types with
respect to `Prompt Safety`.
### Usage sample
```python
from langchain_experimental.comprehend_moderation import (BaseModerationConfig,
ModerationPromptSafetyConfig,
ModerationPiiConfig,
ModerationToxicityConfig
)
pii_config = ModerationPiiConfig(
labels=["SSN"],
redact=True,
mask_character="X"
)
toxicity_config = ModerationToxicityConfig(
threshold=0.5
)
prompt_safety_config = ModerationPromptSafetyConfig(
threshold=0.5
)
moderation_config = BaseModerationConfig(
filters=[pii_config, toxicity_config, prompt_safety_config]
)
comp_moderation_with_config = AmazonComprehendModerationChain(
moderation_config=moderation_config, #specify the configuration
client=comprehend_client, #optionally pass the Boto3 Client
verbose=True
)
template = """Question: {question}
Answer:"""
prompt = PromptTemplate(template=template, input_variables=["question"])
responses = [
"Final Answer: A credit card number looks like 1289-2321-1123-2387. A fake SSN number looks like 323-22-9980. John Doe's phone number is (999)253-9876.",
"Final Answer: This is a really shitty way of constructing a birdhouse. This is fucking insane to think that any birds would actually create their motherfucking nests here."
]
llm = FakeListLLM(responses=responses)
llm_chain = LLMChain(prompt=prompt, llm=llm)
chain = (
prompt
| comp_moderation_with_config
| {llm_chain.input_keys[0]: lambda x: x['output'] }
| llm_chain
| { "input": lambda x: x['text'] }
| comp_moderation_with_config
)
try:
response = chain.invoke({"question": "A sample SSN number looks like this 123-456-7890. Can you give me some more samples?"})
except Exception as e:
print(str(e))
else:
print(response['output'])
```
### Output
```python
> Entering new AmazonComprehendModerationChain chain...
Running AmazonComprehendModerationChain...
Running pii Validation...
Running toxicity Validation...
Running prompt safety Validation...
> Finished chain.
> Entering new AmazonComprehendModerationChain chain...
Running AmazonComprehendModerationChain...
Running pii Validation...
Running toxicity Validation...
Running prompt safety Validation...
> Finished chain.
Final Answer: A credit card number looks like 1289-2321-1123-2387. A fake SSN number looks like XXXXXXXXXXXX John Doe's phone number is (999)253-9876.
```
---------
Co-authored-by: Jha <nikjha@amazon.com>
Co-authored-by: Anjan Biswas <anjanavb@amazon.com>
Co-authored-by: Anjan Biswas <84933469+anjanvb@users.noreply.github.com>
Type hinting `*args` as `List[Any]` means that each positional argument
should be a list. Type hinting `**kwargs` as `Dict[str, Any]` means that
each keyword argument should be a dict of strings.
This is almost never what we actually wanted, and doesn't seem to be
what we want in any of the cases I'm replacing here.
Minor lint dependency version upgrade to pick up latest functionality.
Ruff's new v0.1 version comes with lots of nice features, like
fix-safety guarantees and a preview mode for not-yet-stable features:
https://astral.sh/blog/ruff-v0.1.0
- **Description:** fixed a bug in pal-chain when it reports Python
code validation errors. When node.func does not have any ids, the
original code tried to print node.func.id in raising ValueError.
- **Issue:** n/a,
- **Dependencies:** no dependencies,
- **Tag maintainer:** @hazzel-cn, @eyurtsev
- **Twitter handle:** @lazyswamp
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>
Use `.copy()` to fix the bug that the first `llm_inputs` element is
overwritten by the second `llm_inputs` element in `intermediate_steps`.
***Problem description:***
In [line 127](
c732d8fffd/libs/experimental/langchain_experimental/sql/base.py (L127C17-L127C17)),
the `llm_inputs` of the sql generation step is appended as the first
element of `intermediate_steps`:
```
intermediate_steps.append(llm_inputs) # input: sql generation
```
However, `llm_inputs` is a mutable dict, it is updated in [line
179](https://github.com/langchain-ai/langchain/blob/master/libs/experimental/langchain_experimental/sql/base.py#L179)
for the final answer step:
```
llm_inputs["input"] = input_text
```
Then, the updated `llm_inputs` is appended as another element of
`intermediate_steps` in [line
180](c732d8fffd/libs/experimental/langchain_experimental/sql/base.py (L180)):
```
intermediate_steps.append(llm_inputs) # input: final answer
```
As a result, the final `intermediate_steps` returned in [line
189](c732d8fffd/libs/experimental/langchain_experimental/sql/base.py (L189C43-L189C43))
actually contains two same `llm_inputs` elements, i.e., the `llm_inputs`
for the sql generation step overwritten by the one for final answer step
by mistake. Users are not able to get the actual `llm_inputs` for the
sql generation step from `intermediate_steps`
Simply calling `.copy()` when appending `llm_inputs` to
`intermediate_steps` can solve this problem.
- Description: This PR adds a new chain `rl_chain.PickBest` for learned
prompt variable injection, detailed description and usage can be found
in the example notebook added. It essentially adds a
[VowpalWabbit](https://github.com/VowpalWabbit/vowpal_wabbit) layer
before the llm call in order to learn or personalize prompt variable
selections.
Most of the code is to make the API simple and provide lots of defaults
and data wrangling that is needed to use Vowpal Wabbit, so that the user
of the chain doesn't have to worry about it.
- Dependencies:
[vowpal-wabbit-next](https://pypi.org/project/vowpal-wabbit-next/),
- sentence-transformers (already a dep)
- numpy (already a dep)
- tagging @ataymano who contributed to this chain
- Tag maintainer: @baskaryan
- Twitter handle: @olgavrou
Added example notebook and unit tests
### Description
Add instance anonymization - if `John Doe` will appear twice in the
text, it will be treated as the same entity.
The difference between `PresidioAnonymizer` and
`PresidioReversibleAnonymizer` is that only the second one has a
built-in memory, so it will remember anonymization mapping for multiple
texts:
```
>>> anonymizer = PresidioAnonymizer()
>>> anonymizer.anonymize("My name is John Doe. Hi John Doe!")
'My name is Noah Rhodes. Hi Noah Rhodes!'
>>> anonymizer.anonymize("My name is John Doe. Hi John Doe!")
'My name is Brett Russell. Hi Brett Russell!'
```
```
>>> anonymizer = PresidioReversibleAnonymizer()
>>> anonymizer.anonymize("My name is John Doe. Hi John Doe!")
'My name is Noah Rhodes. Hi Noah Rhodes!'
>>> anonymizer.anonymize("My name is John Doe. Hi John Doe!")
'My name is Noah Rhodes. Hi Noah Rhodes!'
```
### Twitter handle
@deepsense_ai / @MaksOpp
### Tag maintainer
@baskaryan @hwchase17 @hinthornw
---------
Co-authored-by: Bagatur <baskaryan@gmail.com>