langchain/libs/experimental/langchain_experimental/data_anonymizer
maks-operlejn-ds 2aae1102b0
Instance anonymization (#10501)
### Description

Add instance anonymization - if `John Doe` will appear twice in the
text, it will be treated as the same entity.
The difference between `PresidioAnonymizer` and
`PresidioReversibleAnonymizer` is that only the second one has a
built-in memory, so it will remember anonymization mapping for multiple
texts:

```
>>> anonymizer = PresidioAnonymizer()
>>> anonymizer.anonymize("My name is John Doe. Hi John Doe!")
'My name is Noah Rhodes. Hi Noah Rhodes!'
>>> anonymizer.anonymize("My name is John Doe. Hi John Doe!")
'My name is Brett Russell. Hi Brett Russell!'
```
```
>>> anonymizer = PresidioReversibleAnonymizer()
>>> anonymizer.anonymize("My name is John Doe. Hi John Doe!")
'My name is Noah Rhodes. Hi Noah Rhodes!'
>>> anonymizer.anonymize("My name is John Doe. Hi John Doe!")
'My name is Noah Rhodes. Hi Noah Rhodes!'
```

### Twitter handle
@deepsense_ai / @MaksOpp

### Tag maintainer
@baskaryan @hwchase17 @hinthornw

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2023-10-05 11:23:02 -07:00
..
__init__.py Data deanonymization (#10093) 2023-09-06 21:33:24 -07:00
base.py Multilingual anonymization (#10327) 2023-09-07 14:42:24 -07:00
deanonymizer_mapping.py Instance anonymization (#10501) 2023-10-05 11:23:02 -07:00
deanonymizer_matching_strategies.py Data deanonymization (#10093) 2023-09-06 21:33:24 -07:00
faker_presidio_mapping.py Multilingual anonymization (#10327) 2023-09-07 14:42:24 -07:00
presidio.py Instance anonymization (#10501) 2023-10-05 11:23:02 -07:00