Commit Graph

74 Commits (248db75cd60596efd932dd83e767e4b55529f778)

Author SHA1 Message Date
Bagatur 41a2548611 Fix presidio docs Colab links 1 year ago
Bagatur 1d2b6c3c67 Reorganize presidio anonymization docs 1 year ago
maks-operlejn-ds 274c3dc3a8
Multilingual anonymization (#10327)
### Description

Add multiple language support to Anonymizer

PII detection in Microsoft Presidio relies on several components - in
addition to the usual pattern matching (e.g. using regex), the analyser
uses a model for Named Entity Recognition (NER) to extract entities such
as:
- `PERSON`
- `LOCATION`
- `DATE_TIME`
- `NRP`
- `ORGANIZATION`


[[Source]](https://github.com/microsoft/presidio/blob/main/presidio-analyzer/presidio_analyzer/predefined_recognizers/spacy_recognizer.py)

To handle NER in specific languages, we utilize unique models from the
`spaCy` library, recognized for its extensive selection covering
multiple languages and sizes. However, it's not restrictive, allowing
for integration of alternative frameworks such as
[Stanza](https://microsoft.github.io/presidio/analyzer/nlp_engines/spacy_stanza/)
or
[transformers](https://microsoft.github.io/presidio/analyzer/nlp_engines/transformers/)
when necessary.

### Future works

- **automatic language detection** - instead of passing the language as
a parameter in `anonymizer.anonymize`, we could detect the language/s
beforehand and then use the corresponding NER model. We have discussed
this internally and @mateusz-wosinski-ds will look into a standalone
language detection tool/chain for LangChain 😄

### Twitter handle
@deepsense_ai / @MaksOpp

### Tag maintainer
@baskaryan @hwchase17 @hinthornw
1 year ago
maks-operlejn-ds f747e76b73
Fixed link to colab notebook (#10320)
small fix to anonymizer documentation
1 year ago
maks-operlejn-ds 4cc4534d81
Data deanonymization (#10093)
### Description

The feature for pseudonymizing data with ability to retrieve original
text (deanonymization) has been implemented. In order to protect private
data, such as when querying external APIs (OpenAI), it is worth
pseudonymizing sensitive data to maintain full privacy. But then, after
the model response, it would be good to have the data in the original
form.

I implemented the `PresidioReversibleAnonymizer`, which consists of two
parts:

1. anonymization - it works the same way as `PresidioAnonymizer`, plus
the object itself stores a mapping of made-up values to original ones,
for example:
```
    {
        "PERSON": {
            "<anonymized>": "<original>",
            "John Doe": "Slim Shady"
        },
        "PHONE_NUMBER": {
            "111-111-1111": "555-555-5555"
        }
        ...
    }
```

2. deanonymization - using the mapping described above, it matches fake
data with original data and then substitutes it.

Between anonymization and deanonymization user can perform different
operations, for example, passing the output to LLM.

### Future works

- **instance anonymization** - at this point, each occurrence of PII is
treated as a separate entity and separately anonymized. Therefore, two
occurrences of the name John Doe in the text will be changed to two
different names. It is therefore worth introducing support for full
instance detection, so that repeated occurrences are treated as a single
object.
- **better matching and substitution of fake values for real ones** -
currently the strategy is based on matching full strings and then
substituting them. Due to the indeterminism of language models, it may
happen that the value in the answer is slightly changed (e.g. *John Doe*
-> *John* or *Main St, New York* -> *New York*) and such a substitution
is then no longer possible. Therefore, it is worth adjusting the
matching for your needs.
- **Q&A with anonymization** - when I'm done writing all the
functionality, I thought it would be a cool resource in documentation to
write a notebook about retrieval from documents using anonymization. An
iterative process, adding new recognizers to fit the data, lessons
learned and what to look out for

### Twitter handle
@deepsense_ai / @MaksOpp

---------

Co-authored-by: MaksOpp <maks.operlejn@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Ikko Eltociear Ashimine ea5d29a702
Update amazon_comprehend_chain.ipynb (#10246)
Huggingface, HuggingFace -> Hugging Face
1 year ago
Aashish Saini 8bba69ffd0
Fixed some grammatical typos in doc files (#10191)
Fixed some grammatical typos in doc files
CC: @baskaryan, @eyurtsev, @rlancemartin.

---------

Co-authored-by: Aashish Saini <141953346+AashishSainiShorthillsAI@users.noreply.github.com>
Co-authored-by: AryamanJaiswalShorthillsAI <142397527+AryamanJaiswalShorthillsAI@users.noreply.github.com>
Co-authored-by: Adarsh Shrivastav <142413097+AdarshKumarShorthillsAI@users.noreply.github.com>
Co-authored-by: Vishal <141389263+VishalYadavShorthillsAI@users.noreply.github.com>
Co-authored-by: ChetnaGuptaShorthillsAI <142381084+ChetnaGuptaShorthillsAI@users.noreply.github.com>
Co-authored-by: PankajKumarShorthillsAI <142473460+PankajKumarShorthillsAI@users.noreply.github.com>
Co-authored-by: AbhishekYadavShorthillsAI <142393903+AbhishekYadavShorthillsAI@users.noreply.github.com>
Co-authored-by: AmitSinghShorthillsAI <142410046+AmitSinghShorthillsAI@users.noreply.github.com>
Co-authored-by: Md Nazish Arman <142379599+MdNazishArmanShorthillsAI@users.noreply.github.com>
Co-authored-by: KamalSharmaShorthillsAI <142474019+KamalSharmaShorthillsAI@users.noreply.github.com>
Co-authored-by: Lakshya <lakshyagupta87@yahoo.com>
Co-authored-by: Aayush <142384656+AayushShorthillsAI@users.noreply.github.com>
Co-authored-by: AnujMauryaShorthillsAI <142393269+AnujMauryaShorthillsAI@users.noreply.github.com>
1 year ago
Nino Risteski 433c4a721e
typo in locall llms fixed (#9755)
Hi, 

I noticed a typo in the local_llms.ipynb file and fixed it. The word
challenge is without 'a' in the original file.
@baskaryan , @eyurtsev

Thanks.

Co-authored-by: Fliprise <fliprise@Fliprises-MacBook-Pro.local>
1 year ago
Aashish Saini fe0e191fb3
Made some Grammatical error fixes (#10156)
Made some Grammatical error fixes.
CC: @baskaryan, @eyurtsev, @rlancemartin.

---------

Co-authored-by: Aashish Saini <141953346+AashishSainiShorthillsAI@users.noreply.github.com>
Co-authored-by: AryamanJaiswalShorthillsAI <142397527+AryamanJaiswalShorthillsAI@users.noreply.github.com>
Co-authored-by: Adarsh Shrivastav <142413097+AdarshKumarShorthillsAI@users.noreply.github.com>
Co-authored-by: Vishal <141389263+VishalYadavShorthillsAI@users.noreply.github.com>
Co-authored-by: ChetnaGuptaShorthillsAI <142381084+ChetnaGuptaShorthillsAI@users.noreply.github.com>
Co-authored-by: PankajKumarShorthillsAI <142473460+PankajKumarShorthillsAI@users.noreply.github.com>
Co-authored-by: AbhishekYadavShorthillsAI <142393903+AbhishekYadavShorthillsAI@users.noreply.github.com>
1 year ago
Lance Martin 16a27ab244
Add prompt hub for various use-cases (#9879)
Use prompt hub in our use-case docs and guides.
1 year ago
Bagatur 5f1c67b47c
Mv LCEL docs up a level (#10073) 1 year ago
Harrison Chase ad9e242a7a
add snippet for max concurrency (#9892) 1 year ago
Bagatur 8d66b00c73
Data anonymizer notebook nit (#10062) 1 year ago
Bagatur 7fa82900cb
guides docs nits (#10005) 1 year ago
Bagatur 2f03e71e67
rename local llm guide (#10004) 1 year ago
Bagatur 781f274d19
make privacy guide section (#10003) 1 year ago
Bagatur 98cce7dcd3
update moderation docs (#10002) 1 year ago
Anish Shah fa0b8f3368
fix broken wandb link in debugging page (#9771)
- Description: Fix broken hyperlink in debugging page
1 year ago
seamusp 25f2c82ae8
docs:misc fixes (#9671)
Improve internal consistency in LangChain documentation
- Change occurrences of eg and eg. to e.g.
- Fix headers containing unnecessary capital letters.
- Change instances of "few shot" to "few-shot".
- Add periods to end of sentences where missing.
- Minor spelling and grammar fixes.
1 year ago
Vanessa Arndorfer 1ea2f9adf4
Document AzureML Deployment Example (#9571)
Description: Link an example of deploying a Langchain app to an AzureML
online endpoint to the deployments documentation page.

Co-authored-by: Vanessa Arndorfer <vaarndor@microsoft.com>
1 year ago
Bagatur d87cfd33e8
Update pydantic compatibility guide (#9496) 1 year ago
NavanitDubeyShorthillsAI b58d492e05
Update pydantic_compatibility.md (#9382)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
1 year ago
AmitSinghShorthillsAI 2b06792c81
Fixing spelling mistakes in fallbacks.ipynb (#9376)
Fix spelling errors in the text: 'Therefore' and 'Retrying

I want to stress that your feedback is invaluable to us and is genuinely
cherished.
With gratitude,
@baskaryan  @hwchase17
1 year ago
Bagatur 5d60ced7b3
pydantic compatibility guide fix (#9418) 1 year ago
Bagatur 0c4683ebcc
Revert "Update compatibility guide for pydantic (#9396)" (#9417) 1 year ago
Eugene Yurtsev b11c233304
Update compatibility guide for pydantic (#9396)
Use langchain.pydantic_v1 instead of pydantic_v1
1 year ago
Sanskar Tanwar c194828be0
Fixed Typo in Fallbacks.ipynb (#9373)
Removed extra "the" in the sentence about the chicken crossing the road
in fallbacks.ipynb. The sentence now reads correctly: "Why did the
chicken cross the road?" This resolves the grammatical error and
improves the overall quality of the content.

@baskaryan , @hinthornw , @hwchase17
1 year ago
Lance Martin b04e472acf
Open source LLM guide (#9266)
Guide for using open source LLMs locally.
1 year ago
Eugene Yurtsev 0f9f213833
Pydantic Compatibility (#9327)
Pydantic Compatibility Guidelines for migration plan + debugging
1 year ago
Harrison Chase 71d5b7c9bf
Harrison/fallbacks (#9233)
Co-authored-by: Bagatur <baskaryan@gmail.com>
1 year ago
Bagatur 249d7d06a2
adapter doc nit (#9234) 1 year ago
Harrison Chase 8d69dacdf3
multiple retreival in parralel (#9174) 1 year ago
Harrison Chase bb6fbf4c71
openai adapters (#8988)
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Co-authored-by: Nuno Campos <nuno@boringbits.io>
1 year ago
Bagatur 0a1be1d501
document lcel fallbacks (#8942) 1 year ago
Nuno Campos 9892e95d03
Add flush=True to stream examples (#8862) 1 year ago
Harrison Chase 472f00ada7
add moderation example (#8718) 1 year ago
Harrison Chase 2bb1d256f3
add example of memory and returning retrieved docs (#8830) 1 year ago
axa99 1f54ec899b
updated interface jupyter notebook explanations (#8689)
Updated the documentation in the interface.ipynb to clearly show the
_input_ and _output_ types for various components @baskaryan
1 year ago
Harrison Chase 66226d1d4d
add example for memory (#8552) 1 year ago
Harrison Chase bca0749a11
conversational retrieval chain in lcel (#8532) 1 year ago
Harrison Chase 5e3b968078
router runnable (#8496)
Co-authored-by: Nuno Campos <nuno@boringbits.io>
1 year ago
Harrison Chase ae4638aa35
improve notebooks (#8461) 1 year ago
Harrison Chase 412fa4e1db
add guide notebook (#8258)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description: a description of the change, 
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

Please make sure you're PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

---------

Co-authored-by: Nuno Campos <nuno@boringbits.io>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
1 year ago
Bagatur 55beab326c
cleanup warnings (#8379) 1 year ago
Bagatur 68763bd25f
mv popular and additional chains to use cases (#8242) 1 year ago
William FH 9eb7e6e27f
Delete Old Evals Examples (#8252)
Still retain:
- Comparison Examples
- Data + QA walkthrough
- QA (but really minimize it)
1 year ago
William FH 30c2d3cd06
Update references (#8243) 1 year ago
Bagatur 483f6c2fe3
mv eval docs (#8209) 1 year ago
Bagatur 854a2be0ca
Add debugging guide (#7956) 1 year ago
William FH c6f2d27789
Docs Nits (#7874)
Add links to reference docs
1 year ago