Commit Graph

5 Commits (926d4cfda734852c21c66cf7821f7e02b462864a)

Author SHA1 Message Date
Predrag Gruevski f94e24dfd7
Install and use `ruff format` instead of black for code formatting. (#12585)
Best to review one commit at a time, since two of the commits are 100%
autogenerated changes from running `ruff format`:
- Install and use `ruff format` instead of black for code formatting.
- Output of `ruff format .` in the `langchain` package.
- Use `ruff format` in experimental package.
- Format changes in experimental package by `ruff format`.
- Manual formatting fixes to make `ruff .` pass.
11 months ago
Erick Friis 1861cc7100
General anthropic functions, steps towards experimental integration tests (#11727)
To match change in js here
https://github.com/langchain-ai/langchainjs/pull/2892

Some integration tests need a bit more work in experimental:
![Screenshot 2023-10-12 at 12 02 49
PM](https://github.com/langchain-ai/langchain/assets/9557659/262d7d22-c405-40e9-afef-669e8d585307)

Pretty sure the sqldatabase ones are an actual regression or change in
interface because it's returning a placeholder.

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
11 months ago
maks-operlejn-ds a8f804a618
Add data anonymizer (#9863)
### Description

The feature for anonymizing data has been implemented. In order to
protect private data, such as when querying external APIs (OpenAI), it
is worth pseudonymizing sensitive data to maintain full privacy.

Anonynization consists of two steps:

1. **Identification:** Identify all data fields that contain personally
identifiable information (PII).
2. **Replacement**: Replace all PIIs with pseudo values or codes that do
not reveal any personal information about the individual but can be used
for reference. We're not using regular encryption, because the language
model won't be able to understand the meaning or context of the
encrypted data.

We use *Microsoft Presidio* together with *Faker* framework for
anonymization purposes because of the wide range of functionalities they
provide. The full implementation is available in `PresidioAnonymizer`.

### Future works

- **deanonymization** - add the ability to reverse anonymization. For
example, the workflow could look like this: `anonymize -> LLMChain ->
deanonymize`. By doing this, we will retain anonymity in requests to,
for example, OpenAI, and then be able restore the original data.
- **instance anonymization** - at this point, each occurrence of PII is
treated as a separate entity and separately anonymized. Therefore, two
occurrences of the name John Doe in the text will be changed to two
different names. It is therefore worth introducing support for full
instance detection, so that repeated occurrences are treated as a single
object.

### Twitter handle
@deepsense_ai / @MaksOpp

---------

Co-authored-by: MaksOpp <maks.operlejn@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Martin Krasser 93260a9922
Fix broken `make` targets `format_diff` and `lint_diff` (#8344)
Since the refactoring into sub-projects `libs/langchain` and
`libs/experimental`, the `make` targets `format_diff` and `lint_diff` do
not work anymore when running `make` from these subdirectories. Reason
is that

```
PYTHON_FILES=$(shell git diff --name-only --diff-filter=d master | grep -E '\.py$$|\.ipynb$$')
```

generates paths from the project's root directory instead of the
corresponding subdirectories. This PR fixes this by adding a
`--relative` command line option.

- Tag maintainer: @baskaryan
1 year ago
Harrison Chase da04760de1
Harrison/move experimental (#8084) 1 year ago