langchain/libs/experimental/langchain_experimental
maks-operlejn-ds 274c3dc3a8
Multilingual anonymization (#10327)
### Description

Add multiple language support to Anonymizer

PII detection in Microsoft Presidio relies on several components - in
addition to the usual pattern matching (e.g. using regex), the analyser
uses a model for Named Entity Recognition (NER) to extract entities such
as:
- `PERSON`
- `LOCATION`
- `DATE_TIME`
- `NRP`
- `ORGANIZATION`


[[Source]](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/predefined_recognizers/spacy_recognizer.py)

To handle NER in specific languages, we utilize unique models from the
`spaCy` library, recognized for its extensive selection covering
multiple languages and sizes. However, it's not restrictive, allowing
for integration of alternative frameworks such as
[Stanza](https://microsoft.github.io/presidio/analyzer/nlp_engines/spacy_stanza/)
or
[transformers](https://microsoft.github.io/presidio/analyzer/nlp_engines/transformers/)
when necessary.

### Future works

- **automatic language detection** - instead of passing the language as
a parameter in `anonymizer.anonymize`, we could detect the language/s
beforehand and then use the corresponding NER model. We have discussed
this internally and @mateusz-wosinski-ds will look into a standalone
language detection tool/chain for LangChain 😄

### Twitter handle
@deepsense_ai / @MaksOpp

### Tag maintainer
@baskaryan @hwchase17 @hinthornw
2023-09-07 14:42:24 -07:00
..
autonomous_agents Harrison/string inplace (#10153) 2023-09-03 14:25:29 -07:00
comprehend_moderation Harrison/string inplace (#10153) 2023-09-03 14:25:29 -07:00
cpal Add security notices on PAL and CPAL experimental chains. (#9938) 2023-08-29 13:51:56 -04:00
data_anonymizer Multilingual anonymization (#10327) 2023-09-07 14:42:24 -07:00
fallacy_removal adding new chain for logical fallacy removal from model output in chain (#9887) 2023-09-03 15:44:27 -07:00
generative_agents Use a submodule for pydantic v1 compat (#9371) 2023-08-17 16:35:49 +01:00
graph_transformers Diffbot Graph Transformer / Neo4j Graph document ingestion (#9979) 2023-09-06 13:32:59 -07:00
llms Use a submodule for pydantic v1 compat (#9371) 2023-08-17 16:35:49 +01:00
pal_chain Add security notices on PAL and CPAL experimental chains. (#9938) 2023-08-29 13:51:56 -04:00
plan_and_execute Use a submodule for pydantic v1 compat (#9371) 2023-08-17 16:35:49 +01:00
prompts Harrison/official pre release (#8106) 2023-07-21 18:44:32 -07:00
pydantic_v1 poetry lock the experimental package. (#9478) 2023-08-22 14:09:35 -04:00
retrievers Resolve: VectorSearch enabled SQLChain? (#10177) 2023-09-06 17:08:12 -07:00
smart_llm Use a submodule for pydantic v1 compat (#9371) 2023-08-17 16:35:49 +01:00
sql Resolve: VectorSearch enabled SQLChain? (#10177) 2023-09-06 17:08:12 -07:00
tot Use a submodule for pydantic v1 compat (#9371) 2023-08-17 16:35:49 +01:00
__init__.py Use a submodule for pydantic v1 compat (#9371) 2023-08-17 16:35:49 +01:00
py.typed Add py.typed file to langchain-experimental. (#9557) 2023-08-21 15:37:16 -04:00