code of conduct.md file is missing it is generally present in good repos
which have large community
Replace this entire comment with:
- **Description:** Added a `code_of_conduct.md` file to the repository
to establish community standards and guidelines for contributors.
- **Issue:** N/A
- **Dependencies:** N/A
- **Tag maintainer:** N/A
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
We don't use any of the new functionality at the moment. Just making
sure we don't fall back on versions and fail to benefit from new
patches. This is an easy upgrade and it's always harder to upgrade
across multiple major versions at once.
Adding description of the `View deployment` button on the PR page. This
nice feature was not documented.
---------
Co-authored-by: Erick Friis <erickfriis@gmail.com>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes (if applicable),
- **Dependencies:** any dependencies required for this change,
- **Tag maintainer:** for a quicker response, tag the relevant
maintainer (see below),
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in `docs/extras`
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
### Description
renamed several repository links from `hwchase17` to `langchain-ai`.
### Why
I discovered that the README file in the devcontainer contains an old
repository name, so I took the opportunity to rename the old repository
name in all files within the repository, excluding those that do not
require changes.
### Dependencies
none
### Tag maintainer
@baskaryan
### Twitter handle
[kzk_maeda](https://twitter.com/kzk_maeda)
Adds LangServe package
* Integrate Runnables with Fast API creating Server and a RemoteRunnable
client
* Support multiple runnables for a given server
* Support sync/async/batch/abatch/stream/astream/astream_log on the
client side (using async implementations on server)
* Adds validation using annotations (relying on pydantic under the hood)
-- this still has some rough edges -- e.g., open api docs do NOT
generate correctly at the moment
* Uses pydantic v1 namespace
Known issues: type translation code doesn't handle a lot of types (e.g.,
TypedDicts)
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
`mypy` cannot type-check code that relies on dependencies that aren't
installed.
Eventually we'll probably want to install as many optional dependencies
as possible. However, the full "extended deps" setup for langchain
creates a 3GB cache file and takes a while to unpack and install. We'll
probably want something a bit more targeted.
This is a first step toward something better.
A test file was accidentally dropping a `results.json` file in the
current working directory as a result of running `make test`.
This is undesirable, since we don't want to risk accidentally adding
stray files into the repo if we run tests locally and then do `git add
.` without inspecting the file list very closely.
It seems the caching action was not always correctly recreating
softlinks. At first glance, the softlinks it created seemed fine, but
they didn't always work. Possibly hitting some kind of underlying bug,
but not particularly worth debugging in depth -- we can manually create
the soft links we need.
- Revert "Temporarily disable step that seems to be transiently failing.
(#10234)"
- Refresh shell hashtable and show poetry/python location and version.
Make sure that changes to CI infrastructure get tested on CI before
being merged.
Without this PR, changes to the poetry setup action don't trigger a CI
run and in principle could break `master` when merged.
### Description
The feature for anonymizing data has been implemented. In order to
protect private data, such as when querying external APIs (OpenAI), it
is worth pseudonymizing sensitive data to maintain full privacy.
Anonynization consists of two steps:
1. **Identification:** Identify all data fields that contain personally
identifiable information (PII).
2. **Replacement**: Replace all PIIs with pseudo values or codes that do
not reveal any personal information about the individual but can be used
for reference. We're not using regular encryption, because the language
model won't be able to understand the meaning or context of the
encrypted data.
We use *Microsoft Presidio* together with *Faker* framework for
anonymization purposes because of the wide range of functionalities they
provide. The full implementation is available in `PresidioAnonymizer`.
### Future works
- **deanonymization** - add the ability to reverse anonymization. For
example, the workflow could look like this: `anonymize -> LLMChain ->
deanonymize`. By doing this, we will retain anonymity in requests to,
for example, OpenAI, and then be able restore the original data.
- **instance anonymization** - at this point, each occurrence of PII is
treated as a separate entity and separately anonymized. Therefore, two
occurrences of the name John Doe in the text will be changed to two
different names. It is therefore worth introducing support for full
instance detection, so that repeated occurrences are treated as a single
object.
### Twitter handle
@deepsense_ai / @MaksOpp
---------
Co-authored-by: MaksOpp <maks.operlejn@gmail.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
<!-- Thank you for contributing to LangChain!
Replace this entire comment with:
- Description: a description of the change,
- Issue: the issue # it fixes (if applicable),
- Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` to check this
locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. These live is docs/extras
directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17, @rlancemartin.
-->
Hi LangChain :) Thank you for such a great project!
I was going through the CONTRIBUTING.md and found a few minor issues.
With this PR:
- All lint and test jobs use the exact same Python + Poetry installation
approach, instead of lints doing it one way and tests doing it another
way.
- The Poetry installation itself is cached, which saves ~15s per run.
- We no longer pass shell commands as workflow arguments to a workflow
that just runs them in a shell. This makes our actions more resilient to
shell code injection.
If y'all like this approach, I can modify the scheduled tests workflow
and the release workflow to use this too.
Update installation instructions to only install test dependencies rather than all dependencies.
---------
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
If another push to the same PR or branch happens while its CI is still
running, cancel the earlier run in favor of the next run.
There's no point in testing an outdated version of the code. GitHub only
allows a limited number of job runners to be active at the same time, so
it's better to cancel pointless jobs early so that more useful jobs can
run sooner.