mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

History

maks-operlejn-ds a8f804a618 Add data anonymizer (#9863 ) ### Description The feature for anonymizing data has been implemented. In order to protect private data, such as when querying external APIs (OpenAI), it is worth pseudonymizing sensitive data to maintain full privacy. Anonynization consists of two steps: 1. Identification: Identify all data fields that contain personally identifiable information (PII). 2. Replacement: Replace all PIIs with pseudo values or codes that do not reveal any personal information about the individual but can be used for reference. We're not using regular encryption, because the language model won't be able to understand the meaning or context of the encrypted data. We use Microsoft Presidio together with Faker framework for anonymization purposes because of the wide range of functionalities they provide. The full implementation is available in `PresidioAnonymizer`. ### Future works - deanonymization - add the ability to reverse anonymization. For example, the workflow could look like this: `anonymize -> LLMChain -> deanonymize`. By doing this, we will retain anonymity in requests to, for example, OpenAI, and then be able restore the original data. - instance anonymization - at this point, each occurrence of PII is treated as a separate entity and separately anonymized. Therefore, two occurrences of the name John Doe in the text will be changed to two different names. It is therefore worth introducing support for full instance detection, so that repeated occurrences are treated as a single object. ### Twitter handle @deepsense_ai / @MaksOpp --------- Co-authored-by: MaksOpp <maks.operlejn@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com>		2023-08-30 10:39:44 -07:00
..
langchain_experimental	Add data anonymizer (#9863 )	2023-08-30 10:39:44 -07:00
tests	Add data anonymizer (#9863 )	2023-08-30 10:39:44 -07:00
Makefile	Add data anonymizer (#9863 )	2023-08-30 10:39:44 -07:00
poetry.lock	Add data anonymizer (#9863 )	2023-08-30 10:39:44 -07:00
poetry.toml	Harrison/move experimental (#8084 )	2023-07-21 10:36:28 -07:00
pyproject.toml	Add data anonymizer (#9863 )	2023-08-30 10:39:44 -07:00
README.md	Add notice about security-sensitive experimental code to experimental README. (#9936 )	2023-08-29 14:21:30 -04:00

README.md

🦜️🧪 LangChain Experimental

This package holds experimental LangChain code, intended for research and experimental uses.

Warning

Portions of the code in this package may be dangerous if not properly deployed in a sandboxed environment. Please be wary of deploying experimental code to production unless you've taken appropriate precautions and have already discussed it with your security team.

Some of the code here may be marked with security notices. However, given the exploratory and experimental nature of the code in this package, the lack of a security notice on a piece of code does not mean that the code in question does not require additional security considerations in order to be safe to use.

README.md Unescape Escape

🦜️🧪 LangChain Experimental

README.md