langchain/.github/workflows/langchain_experimental_ci.yml

---
name: libs/experimental CI

on:
  push:
    branches: [master]
  pull_request:
    paths:
      - ".github/actions/poetry_setup/action.yml"
      - ".github/tools/**"
      - ".github/workflows/_lint.yml"
      - ".github/workflows/_test.yml"
      - ".github/workflows/langchain_experimental_ci.yml"
      - "libs/*"
      - "libs/experimental/**"
      - "libs/langchain/**"
      - "libs/core/**"
  workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI

# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
#
# There's no point in testing an outdated version of the code. GitHub only allows
# a limited number of job runners to be active at the same time, so it's better to cancel
# pointless jobs early so that more useful jobs can run sooner.
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

env:
  POETRY_VERSION: "1.6.1"
  WORKDIR: "libs/experimental"

jobs:
  lint:
    uses: ./.github/workflows/_lint.yml
    with:
      working-directory: libs/experimental
    secrets: inherit

  test:
    uses: ./.github/workflows/_test.yml
    with:
      working-directory: libs/experimental
    secrets: inherit

  compile-integration-tests:
    uses: ./.github/workflows/_compile_integration_test.yml
    with:
      working-directory: libs/experimental
    secrets: inherit

  # It's possible that langchain-experimental works fine with the latest *published* langchain,
  # but is broken with the langchain on `master`.
  #
  # We want to catch situations like that *before* releasing a new langchain, hence this test.
  test-with-latest-langchain:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: ${{ env.WORKDIR }}
    strategy:
      matrix:
        python-version:
          - "3.8"
          - "3.9"
          - "3.10"
          - "3.11"
    name: test with unpublished langchain - Python ${{ matrix.python-version }}
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
        uses: "./.github/actions/poetry_setup"
        with:
          python-version: ${{ matrix.python-version }}
          poetry-version: ${{ env.POETRY_VERSION }}
          working-directory: ${{ env.WORKDIR }}
          cache-key: unpublished-langchain

      - name: Install dependencies
        shell: bash
        run: |
          echo "Running tests with unpublished langchain, installing dependencies with poetry..."
          poetry install

          echo "Editably installing langchain outside of poetry, to avoid messing up lockfile..."
          poetry run pip install -e ../langchain
          poetry run pip install -e ../core

      - name: Run tests
        run: make test
  extended-tests:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: ${{ env.WORKDIR }}
    strategy:
      matrix:
        python-version:
          - "3.8"
          - "3.9"
          - "3.10"
          - "3.11"
    name: Python ${{ matrix.python-version }} extended tests
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
        uses: "./.github/actions/poetry_setup"
        with:
          python-version: ${{ matrix.python-version }}
          poetry-version: ${{ env.POETRY_VERSION }}
          working-directory: libs/experimental
          cache-key: extended

      - name: Install dependencies
        shell: bash
        run: |
          echo "Running extended tests, installing dependencies with poetry..."
          poetry install -E extended_testing

      - name: Run extended tests
        run: make extended_tests

      - name: Ensure the tests did not create any additional files
        shell: bash
        run: |
          set -eu

          STATUS="$(git status)"
          echo "$STATUS"

          # grep will exit non-zero if the target message isn't found,
          # and `set -e` above will cause the step to fail.
          echo "$STATUS" | grep 'nothing to commit, working tree clean'
Harrison/move experimental (#8084) 1 year ago			`---`
scheduled tests GHA (#8879) Adding scheduled daily GHA that runs marked integration tests. To start just marking some tests in test_openai 1 year ago			`name: libs/experimental CI`
Harrison/move experimental (#8084) 1 year ago
			`on:`
			`push:`
infra[patch]: Use langchain core in-tree as a dev dependency (#13957) Using the published version means master is broken for contributors whenever we make changes in one lib that depend on the other. 10 months ago			`branches: [master]`
Harrison/move experimental (#8084) 1 year ago			`pull_request:`
			`paths:`
infra[patch]: Use langchain core in-tree as a dev dependency (#13957) Using the published version means master is broken for contributors whenever we make changes in one lib that depend on the other. 10 months ago			`- ".github/actions/poetry_setup/action.yml"`
			`- ".github/tools/**"`
			`- ".github/workflows/_lint.yml"`
			`- ".github/workflows/_test.yml"`
			`- ".github/workflows/langchain_experimental_ci.yml"`
			`- "libs/*"`
			`- "libs/experimental/**"`
			`- "libs/langchain/**"`
			`- "libs/core/**"`
			`workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI`
Harrison/move experimental (#8084) 1 year ago
Set up concurrency groups and workflow cancelation in CI. (#9564) If another push to the same PR or branch happens while its CI is still running, cancel the earlier run in favor of the next run. There's no point in testing an outdated version of the code. GitHub only allows a limited number of job runners to be active at the same time, so it's better to cancel pointless jobs early so that more useful jobs can run sooner. 1 year ago			`# If another push to the same PR or branch happens while this workflow is still running,`
			`# cancel the earlier run in favor of the next run.`
			`#`
			`# There's no point in testing an outdated version of the code. GitHub only allows`
			`# a limited number of job runners to be active at the same time, so it's better to cancel`
			`# pointless jobs early so that more useful jobs can run sooner.`
			`concurrency:`
			`group: ${{ github.workflow }}-${{ github.ref }}`
			`cancel-in-progress: true`

Test experimental package with `langchain` on `master` branch. (#9621) It's possible that langchain-experimental works fine with the latest published langchain, but is broken with the langchain on `master`. Unfortunately, you can see this is currently the case — this is why this PR also includes a minor fix for the `langchain` package itself. We want to catch situations like that before releasing a new langchain, hence this test. 1 year ago			`env:`
Upgrade CI workflows to poetry 1.6.1. (#11344) 12 months ago			`POETRY_VERSION: "1.6.1"`
Test experimental package with `langchain` on `master` branch. (#9621) It's possible that langchain-experimental works fine with the latest published langchain, but is broken with the langchain on `master`. Unfortunately, you can see this is currently the case — this is why this PR also includes a minor fix for the `langchain` package itself. We want to catch situations like that before releasing a new langchain, hence this test. 1 year ago			`WORKDIR: "libs/experimental"`

Harrison/move experimental (#8084) 1 year ago			`jobs:`
			`lint:`
infra[patch]: Use langchain core in-tree as a dev dependency (#13957) Using the published version means master is broken for contributors whenever we make changes in one lib that depend on the other. 10 months ago			`uses: ./.github/workflows/_lint.yml`
Harrison/move experimental (#8084) 1 year ago			`with:`
			`working-directory: libs/experimental`
			`secrets: inherit`
Test experimental package with `langchain` on `master` branch. (#9621) It's possible that langchain-experimental works fine with the latest published langchain, but is broken with the langchain on `master`. Unfortunately, you can see this is currently the case — this is why this PR also includes a minor fix for the `langchain` package itself. We want to catch situations like that before releasing a new langchain, hence this test. 1 year ago
Harrison/move experimental (#8084) 1 year ago			`test:`
infra[patch]: Use langchain core in-tree as a dev dependency (#13957) Using the published version means master is broken for contributors whenever we make changes in one lib that depend on the other. 10 months ago			`uses: ./.github/workflows/_test.yml`
Harrison/move experimental (#8084) 1 year ago			`with:`
			`working-directory: libs/experimental`
Eliminate special-casing from test CI workflows. (#9562) The previous approach was relying on `_test.yml` taking an input parameter, and then doing almost completely orthogonal things for each parameter value. I've separated out each of those test situations as its own job or workflow file, which eliminated all the special-casing and, in my opinion, improved maintainability by making it much more obvious what code runs when. 1 year ago			`secrets: inherit`
Test experimental package with `langchain` on `master` branch. (#9621) It's possible that langchain-experimental works fine with the latest published langchain, but is broken with the langchain on `master`. Unfortunately, you can see this is currently the case — this is why this PR also includes a minor fix for the `langchain` package itself. We want to catch situations like that before releasing a new langchain, hence this test. 1 year ago
separate compile integration tests (#12171) Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com> 11 months ago			`compile-integration-tests:`
infra[patch]: Use langchain core in-tree as a dev dependency (#13957) Using the published version means master is broken for contributors whenever we make changes in one lib that depend on the other. 10 months ago			`uses: ./.github/workflows/_compile_integration_test.yml`
separate compile integration tests (#12171) Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com> 11 months ago			`with:`
			`working-directory: libs/experimental`
			`secrets: inherit`

Test experimental package with `langchain` on `master` branch. (#9621) It's possible that langchain-experimental works fine with the latest published langchain, but is broken with the langchain on `master`. Unfortunately, you can see this is currently the case — this is why this PR also includes a minor fix for the `langchain` package itself. We want to catch situations like that before releasing a new langchain, hence this test. 1 year ago			`# It's possible that langchain-experimental works fine with the latest published langchain,`
			# but is broken with the langchain on `master`.
			`#`
			`# We want to catch situations like that before releasing a new langchain, hence this test.`
			`test-with-latest-langchain:`
			`runs-on: ubuntu-latest`
			`defaults:`
			`run:`
			`working-directory: ${{ env.WORKDIR }}`
			`strategy:`
			`matrix:`
			`python-version:`
			`- "3.8"`
			`- "3.9"`
			`- "3.10"`
			`- "3.11"`
			`name: test with unpublished langchain - Python ${{ matrix.python-version }}`
			`steps:`
Update to `actions/checkout@v4`. (#11951) We don't use any of the new functionality at the moment. Just making sure we don't fall back on versions and fail to benefit from new patches. This is an easy upgrade and it's always harder to upgrade across multiple major versions at once. 11 months ago			`- uses: actions/checkout@v4`
Cache poetry install + unify Python/Poetry setup for lint and test jobs. (#9625) With this PR: - All lint and test jobs use the exact same Python + Poetry installation approach, instead of lints doing it one way and tests doing it another way. - The Poetry installation itself is cached, which saves ~15s per run. - We no longer pass shell commands as workflow arguments to a workflow that just runs them in a shell. This makes our actions more resilient to shell code injection. If y'all like this approach, I can modify the scheduled tests workflow and the release workflow to use this too. 1 year ago
			`- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}`
Test experimental package with `langchain` on `master` branch. (#9621) It's possible that langchain-experimental works fine with the latest published langchain, but is broken with the langchain on `master`. Unfortunately, you can see this is currently the case — this is why this PR also includes a minor fix for the `langchain` package itself. We want to catch situations like that before releasing a new langchain, hence this test. 1 year ago			`uses: "./.github/actions/poetry_setup"`
			`with:`
			`python-version: ${{ matrix.python-version }}`
			`poetry-version: ${{ env.POETRY_VERSION }}`
Cache poetry install + unify Python/Poetry setup for lint and test jobs. (#9625) With this PR: - All lint and test jobs use the exact same Python + Poetry installation approach, instead of lints doing it one way and tests doing it another way. - The Poetry installation itself is cached, which saves ~15s per run. - We no longer pass shell commands as workflow arguments to a workflow that just runs them in a shell. This makes our actions more resilient to shell code injection. If y'all like this approach, I can modify the scheduled tests workflow and the release workflow to use this too. 1 year ago			`working-directory: ${{ env.WORKDIR }}`
Test experimental package with `langchain` on `master` branch. (#9621) It's possible that langchain-experimental works fine with the latest published langchain, but is broken with the langchain on `master`. Unfortunately, you can see this is currently the case — this is why this PR also includes a minor fix for the `langchain` package itself. We want to catch situations like that before releasing a new langchain, hence this test. 1 year ago			`cache-key: unpublished-langchain`

Cache poetry install + unify Python/Poetry setup for lint and test jobs. (#9625) With this PR: - All lint and test jobs use the exact same Python + Poetry installation approach, instead of lints doing it one way and tests doing it another way. - The Poetry installation itself is cached, which saves ~15s per run. - We no longer pass shell commands as workflow arguments to a workflow that just runs them in a shell. This makes our actions more resilient to shell code injection. If y'all like this approach, I can modify the scheduled tests workflow and the release workflow to use this too. 1 year ago			`- name: Install dependencies`
			`shell: bash`
			`run: \|`
			`echo "Running tests with unpublished langchain, installing dependencies with poetry..."`
			`poetry install`

			`echo "Editably installing langchain outside of poetry, to avoid messing up lockfile..."`
			`poetry run pip install -e ../langchain`
Separate out langchain_core package (#13577) Co-authored-by: Nuno Campos <nuno@boringbits.io> Co-authored-by: Bagatur <baskaryan@gmail.com> Co-authored-by: Erick Friis <erick@langchain.dev> 10 months ago			`poetry run pip install -e ../core`
Cache poetry install + unify Python/Poetry setup for lint and test jobs. (#9625) With this PR: - All lint and test jobs use the exact same Python + Poetry installation approach, instead of lints doing it one way and tests doing it another way. - The Poetry installation itself is cached, which saves ~15s per run. - We no longer pass shell commands as workflow arguments to a workflow that just runs them in a shell. This makes our actions more resilient to shell code injection. If y'all like this approach, I can modify the scheduled tests workflow and the release workflow to use this too. 1 year ago
Test experimental package with `langchain` on `master` branch. (#9621) It's possible that langchain-experimental works fine with the latest published langchain, but is broken with the langchain on `master`. Unfortunately, you can see this is currently the case — this is why this PR also includes a minor fix for the `langchain` package itself. We want to catch situations like that before releasing a new langchain, hence this test. 1 year ago			`- name: Run tests`
			`run: make test`
Add data anonymizer (#9863) ### Description The feature for anonymizing data has been implemented. In order to protect private data, such as when querying external APIs (OpenAI), it is worth pseudonymizing sensitive data to maintain full privacy. Anonynization consists of two steps: 1. Identification: Identify all data fields that contain personally identifiable information (PII). 2. Replacement: Replace all PIIs with pseudo values or codes that do not reveal any personal information about the individual but can be used for reference. We're not using regular encryption, because the language model won't be able to understand the meaning or context of the encrypted data. We use Microsoft Presidio together with Faker framework for anonymization purposes because of the wide range of functionalities they provide. The full implementation is available in `PresidioAnonymizer`. ### Future works - deanonymization - add the ability to reverse anonymization. For example, the workflow could look like this: `anonymize -> LLMChain -> deanonymize`. By doing this, we will retain anonymity in requests to, for example, OpenAI, and then be able restore the original data. - instance anonymization - at this point, each occurrence of PII is treated as a separate entity and separately anonymized. Therefore, two occurrences of the name John Doe in the text will be changed to two different names. It is therefore worth introducing support for full instance detection, so that repeated occurrences are treated as a single object. ### Twitter handle @deepsense_ai / @MaksOpp --------- Co-authored-by: MaksOpp <maks.operlejn@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> 1 year ago			`extended-tests:`
			`runs-on: ubuntu-latest`
			`defaults:`
			`run:`
			`working-directory: ${{ env.WORKDIR }}`
			`strategy:`
			`matrix:`
			`python-version:`
			`- "3.8"`
			`- "3.9"`
			`- "3.10"`
			`- "3.11"`
			`name: Python ${{ matrix.python-version }} extended tests`
			`steps:`
Update to `actions/checkout@v4`. (#11951) We don't use any of the new functionality at the moment. Just making sure we don't fall back on versions and fail to benefit from new patches. This is an easy upgrade and it's always harder to upgrade across multiple major versions at once. 11 months ago			`- uses: actions/checkout@v4`
Add data anonymizer (#9863) ### Description The feature for anonymizing data has been implemented. In order to protect private data, such as when querying external APIs (OpenAI), it is worth pseudonymizing sensitive data to maintain full privacy. Anonynization consists of two steps: 1. Identification: Identify all data fields that contain personally identifiable information (PII). 2. Replacement: Replace all PIIs with pseudo values or codes that do not reveal any personal information about the individual but can be used for reference. We're not using regular encryption, because the language model won't be able to understand the meaning or context of the encrypted data. We use Microsoft Presidio together with Faker framework for anonymization purposes because of the wide range of functionalities they provide. The full implementation is available in `PresidioAnonymizer`. ### Future works - deanonymization - add the ability to reverse anonymization. For example, the workflow could look like this: `anonymize -> LLMChain -> deanonymize`. By doing this, we will retain anonymity in requests to, for example, OpenAI, and then be able restore the original data. - instance anonymization - at this point, each occurrence of PII is treated as a separate entity and separately anonymized. Therefore, two occurrences of the name John Doe in the text will be changed to two different names. It is therefore worth introducing support for full instance detection, so that repeated occurrences are treated as a single object. ### Twitter handle @deepsense_ai / @MaksOpp --------- Co-authored-by: MaksOpp <maks.operlejn@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> 1 year ago
			`- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}`
			`uses: "./.github/actions/poetry_setup"`
			`with:`
			`python-version: ${{ matrix.python-version }}`
			`poetry-version: ${{ env.POETRY_VERSION }}`
			`working-directory: libs/experimental`
			`cache-key: extended`

			`- name: Install dependencies`
			`shell: bash`
			`run: \|`
			`echo "Running extended tests, installing dependencies with poetry..."`
			`poetry install -E extended_testing`

			`- name: Run extended tests`
			`run: make extended_tests`
Deny creating files as a result of test runs. (#10253) A test file was accidentally dropping a `results.json` file in the current working directory as a result of running `make test`. This is undesirable, since we don't want to risk accidentally adding stray files into the repo if we run tests locally and then do `git add .` without inspecting the file list very closely. 1 year ago
			`- name: Ensure the tests did not create any additional files`
			`shell: bash`
			`run: \|`
			`set -eu`

			`STATUS="$(git status)"`
			`echo "$STATUS"`

			`# grep will exit non-zero if the target message isn't found,`
			# and `set -e` above will cause the step to fail.
			`echo "$STATUS" \| grep 'nothing to commit, working tree clean'`