langchain/.github/workflows/langchain_experimental_ci.yml

---
name: libs/experimental CI

on:
  push:
    branches: [ master ]
  pull_request:
    paths:
      - '.github/actions/poetry_setup/action.yml'
      - '.github/tools/**'
      - '.github/workflows/_lint.yml'
      - '.github/workflows/_test.yml'
      - '.github/workflows/langchain_experimental_ci.yml'
      - 'libs/*'
      - 'libs/experimental/**'
  workflow_dispatch:  # Allows to trigger the workflow manually in GitHub UI

# If another push to the same PR or branch happens while this workflow is still running,
# cancel the earlier run in favor of the next run.
#
# There's no point in testing an outdated version of the code. GitHub only allows
# a limited number of job runners to be active at the same time, so it's better to cancel
# pointless jobs early so that more useful jobs can run sooner.
concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

env:
  POETRY_VERSION: "1.6.1"
  WORKDIR: "libs/experimental"

jobs:
  lint:
    uses:
      ./.github/workflows/_lint.yml
    with:
      working-directory: libs/experimental
      langchain-location: ../langchain
    secrets: inherit

  test:
    uses:
      ./.github/workflows/_test.yml
    with:
      working-directory: libs/experimental
    secrets: inherit

  compile-integration-tests:
    uses:
      ./.github/workflows/_compile_integration_test.yml
    with:
      working-directory: libs/experimental
    secrets: inherit

  # It's possible that langchain-experimental works fine with the latest *published* langchain,
  # but is broken with the langchain on `master`.
  #
  # We want to catch situations like that *before* releasing a new langchain, hence this test.
  test-with-latest-langchain:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: ${{ env.WORKDIR }}
    strategy:
      matrix:
        python-version:
          - "3.8"
          - "3.9"
          - "3.10"
          - "3.11"
    name: test with unpublished langchain - Python ${{ matrix.python-version }}
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
        uses: "./.github/actions/poetry_setup"
        with:
          python-version: ${{ matrix.python-version }}
          poetry-version: ${{ env.POETRY_VERSION }}
          working-directory: ${{ env.WORKDIR }}
          cache-key: unpublished-langchain

      - name: Install dependencies
        shell: bash
        run: |
          echo "Running tests with unpublished langchain, installing dependencies with poetry..."
          poetry install

          echo "Editably installing langchain outside of poetry, to avoid messing up lockfile..."
          poetry run pip install -e ../langchain

      - name: Run tests
        run: make test
  extended-tests:
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: ${{ env.WORKDIR }}
    strategy:
      matrix:
        python-version:
          - "3.8"
          - "3.9"
          - "3.10"
          - "3.11"
    name: Python ${{ matrix.python-version }} extended tests
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}
        uses: "./.github/actions/poetry_setup"
        with:
          python-version: ${{ matrix.python-version }}
          poetry-version: ${{ env.POETRY_VERSION }}
          working-directory: libs/experimental
          cache-key: extended

      - name: Install dependencies
        shell: bash
        run: |
          echo "Running extended tests, installing dependencies with poetry..."
          poetry install -E extended_testing

      - name: Run extended tests
        run: make extended_tests

      - name: Ensure the tests did not create any additional files
        shell: bash
        run: |
          set -eu

          STATUS="$(git status)"
          echo "$STATUS"

          # grep will exit non-zero if the target message isn't found,
          # and `set -e` above will cause the step to fail.
          echo "$STATUS" | grep 'nothing to commit, working tree clean'
Harrison/move experimental (#8084) 2023-07-21 17:36:28 +00:00			`---`
scheduled tests GHA (#8879) Adding scheduled daily GHA that runs marked integration tests. To start just marking some tests in test_openai 2023-08-08 21:55:25 +00:00			`name: libs/experimental CI`
Harrison/move experimental (#8084) 2023-07-21 17:36:28 +00:00
			`on:`
			`push:`
			`branches: [ master ]`
			`pull_request:`
			`paths:`
Run CI when CI infra itself has changed. (#10239) Make sure that changes to CI infrastructure get tested on CI before being merged. Without this PR, changes to the poetry setup action don't trigger a CI run and in principle could break `master` when merged. 2023-09-05 17:08:19 +00:00			`- '.github/actions/poetry_setup/action.yml'`
			`- '.github/tools/**'`
Harrison/move experimental (#8084) 2023-07-21 17:36:28 +00:00			`- '.github/workflows/_lint.yml'`
			`- '.github/workflows/_test.yml'`
			`- '.github/workflows/langchain_experimental_ci.yml'`
better lint triggering (#12376) 2023-10-26 22:31:20 +00:00			`- 'libs/*'`
			`- 'libs/experimental/**'`
Harrison/move experimental (#8084) 2023-07-21 17:36:28 +00:00			`workflow_dispatch: # Allows to trigger the workflow manually in GitHub UI`

Set up concurrency groups and workflow cancelation in CI. (#9564) If another push to the same PR or branch happens while its CI is still running, cancel the earlier run in favor of the next run. There's no point in testing an outdated version of the code. GitHub only allows a limited number of job runners to be active at the same time, so it's better to cancel pointless jobs early so that more useful jobs can run sooner. 2023-08-22 18:21:26 +00:00			`# If another push to the same PR or branch happens while this workflow is still running,`
			`# cancel the earlier run in favor of the next run.`
			`#`
			`# There's no point in testing an outdated version of the code. GitHub only allows`
			`# a limited number of job runners to be active at the same time, so it's better to cancel`
			`# pointless jobs early so that more useful jobs can run sooner.`
			`concurrency:`
			`group: ${{ github.workflow }}-${{ github.ref }}`
			`cancel-in-progress: true`

Test experimental package with `langchain` on `master` branch. (#9621) It's possible that langchain-experimental works fine with the latest published langchain, but is broken with the langchain on `master`. Unfortunately, you can see this is currently the case — this is why this PR also includes a minor fix for the `langchain` package itself. We want to catch situations like that before releasing a new langchain, hence this test. 2023-08-22 17:35:21 +00:00			`env:`
Upgrade CI workflows to poetry 1.6.1. (#11344) 2023-10-03 23:23:54 +00:00			`POETRY_VERSION: "1.6.1"`
Test experimental package with `langchain` on `master` branch. (#9621) It's possible that langchain-experimental works fine with the latest published langchain, but is broken with the langchain on `master`. Unfortunately, you can see this is currently the case — this is why this PR also includes a minor fix for the `langchain` package itself. We want to catch situations like that before releasing a new langchain, hence this test. 2023-08-22 17:35:21 +00:00			`WORKDIR: "libs/experimental"`

Harrison/move experimental (#8084) 2023-07-21 17:36:28 +00:00			`jobs:`
			`lint:`
			`uses:`
			`./.github/workflows/_lint.yml`
			`with:`
			`working-directory: libs/experimental`
notebook fmt (#12498) 2023-10-29 22:50:09 +00:00			`langchain-location: ../langchain`
Harrison/move experimental (#8084) 2023-07-21 17:36:28 +00:00			`secrets: inherit`
Test experimental package with `langchain` on `master` branch. (#9621) It's possible that langchain-experimental works fine with the latest published langchain, but is broken with the langchain on `master`. Unfortunately, you can see this is currently the case — this is why this PR also includes a minor fix for the `langchain` package itself. We want to catch situations like that before releasing a new langchain, hence this test. 2023-08-22 17:35:21 +00:00
Harrison/move experimental (#8084) 2023-07-21 17:36:28 +00:00			`test:`
			`uses:`
			`./.github/workflows/_test.yml`
			`with:`
			`working-directory: libs/experimental`
Eliminate special-casing from test CI workflows. (#9562) The previous approach was relying on `_test.yml` taking an input parameter, and then doing almost completely orthogonal things for each parameter value. I've separated out each of those test situations as its own job or workflow file, which eliminated all the special-casing and, in my opinion, improved maintainability by making it much more obvious what code runs when. 2023-08-22 15:36:52 +00:00			`secrets: inherit`
Test experimental package with `langchain` on `master` branch. (#9621) It's possible that langchain-experimental works fine with the latest published langchain, but is broken with the langchain on `master`. Unfortunately, you can see this is currently the case — this is why this PR also includes a minor fix for the `langchain` package itself. We want to catch situations like that before releasing a new langchain, hence this test. 2023-08-22 17:35:21 +00:00
separate compile integration tests (#12171) Co-authored-by: Predrag Gruevski <2348618+obi1kenobi@users.noreply.github.com> 2023-10-24 15:55:19 +00:00			`compile-integration-tests:`
			`uses:`
			`./.github/workflows/_compile_integration_test.yml`
			`with:`
			`working-directory: libs/experimental`
			`secrets: inherit`

Test experimental package with `langchain` on `master` branch. (#9621) It's possible that langchain-experimental works fine with the latest published langchain, but is broken with the langchain on `master`. Unfortunately, you can see this is currently the case — this is why this PR also includes a minor fix for the `langchain` package itself. We want to catch situations like that before releasing a new langchain, hence this test. 2023-08-22 17:35:21 +00:00			`# It's possible that langchain-experimental works fine with the latest published langchain,`
			# but is broken with the langchain on `master`.
			`#`
			`# We want to catch situations like that before releasing a new langchain, hence this test.`
			`test-with-latest-langchain:`
			`runs-on: ubuntu-latest`
			`defaults:`
			`run:`
			`working-directory: ${{ env.WORKDIR }}`
			`strategy:`
			`matrix:`
			`python-version:`
			`- "3.8"`
			`- "3.9"`
			`- "3.10"`
			`- "3.11"`
			`name: test with unpublished langchain - Python ${{ matrix.python-version }}`
			`steps:`
Update to `actions/checkout@v4`. (#11951) We don't use any of the new functionality at the moment. Just making sure we don't fall back on versions and fail to benefit from new patches. This is an easy upgrade and it's always harder to upgrade across multiple major versions at once. 2023-10-23 14:01:33 +00:00			`- uses: actions/checkout@v4`
Cache poetry install + unify Python/Poetry setup for lint and test jobs. (#9625) With this PR: - All lint and test jobs use the exact same Python + Poetry installation approach, instead of lints doing it one way and tests doing it another way. - The Poetry installation itself is cached, which saves ~15s per run. - We no longer pass shell commands as workflow arguments to a workflow that just runs them in a shell. This makes our actions more resilient to shell code injection. If y'all like this approach, I can modify the scheduled tests workflow and the release workflow to use this too. 2023-08-22 19:59:22 +00:00
			`- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}`
Test experimental package with `langchain` on `master` branch. (#9621) It's possible that langchain-experimental works fine with the latest published langchain, but is broken with the langchain on `master`. Unfortunately, you can see this is currently the case — this is why this PR also includes a minor fix for the `langchain` package itself. We want to catch situations like that before releasing a new langchain, hence this test. 2023-08-22 17:35:21 +00:00			`uses: "./.github/actions/poetry_setup"`
			`with:`
			`python-version: ${{ matrix.python-version }}`
			`poetry-version: ${{ env.POETRY_VERSION }}`
Cache poetry install + unify Python/Poetry setup for lint and test jobs. (#9625) With this PR: - All lint and test jobs use the exact same Python + Poetry installation approach, instead of lints doing it one way and tests doing it another way. - The Poetry installation itself is cached, which saves ~15s per run. - We no longer pass shell commands as workflow arguments to a workflow that just runs them in a shell. This makes our actions more resilient to shell code injection. If y'all like this approach, I can modify the scheduled tests workflow and the release workflow to use this too. 2023-08-22 19:59:22 +00:00			`working-directory: ${{ env.WORKDIR }}`
Test experimental package with `langchain` on `master` branch. (#9621) It's possible that langchain-experimental works fine with the latest published langchain, but is broken with the langchain on `master`. Unfortunately, you can see this is currently the case — this is why this PR also includes a minor fix for the `langchain` package itself. We want to catch situations like that before releasing a new langchain, hence this test. 2023-08-22 17:35:21 +00:00			`cache-key: unpublished-langchain`

Cache poetry install + unify Python/Poetry setup for lint and test jobs. (#9625) With this PR: - All lint and test jobs use the exact same Python + Poetry installation approach, instead of lints doing it one way and tests doing it another way. - The Poetry installation itself is cached, which saves ~15s per run. - We no longer pass shell commands as workflow arguments to a workflow that just runs them in a shell. This makes our actions more resilient to shell code injection. If y'all like this approach, I can modify the scheduled tests workflow and the release workflow to use this too. 2023-08-22 19:59:22 +00:00			`- name: Install dependencies`
			`shell: bash`
			`run: \|`
			`echo "Running tests with unpublished langchain, installing dependencies with poetry..."`
			`poetry install`

			`echo "Editably installing langchain outside of poetry, to avoid messing up lockfile..."`
			`poetry run pip install -e ../langchain`

Test experimental package with `langchain` on `master` branch. (#9621) It's possible that langchain-experimental works fine with the latest published langchain, but is broken with the langchain on `master`. Unfortunately, you can see this is currently the case — this is why this PR also includes a minor fix for the `langchain` package itself. We want to catch situations like that before releasing a new langchain, hence this test. 2023-08-22 17:35:21 +00:00			`- name: Run tests`
			`run: make test`
Add data anonymizer (#9863) ### Description The feature for anonymizing data has been implemented. In order to protect private data, such as when querying external APIs (OpenAI), it is worth pseudonymizing sensitive data to maintain full privacy. Anonynization consists of two steps: 1. Identification: Identify all data fields that contain personally identifiable information (PII). 2. Replacement: Replace all PIIs with pseudo values or codes that do not reveal any personal information about the individual but can be used for reference. We're not using regular encryption, because the language model won't be able to understand the meaning or context of the encrypted data. We use Microsoft Presidio together with Faker framework for anonymization purposes because of the wide range of functionalities they provide. The full implementation is available in `PresidioAnonymizer`. ### Future works - deanonymization - add the ability to reverse anonymization. For example, the workflow could look like this: `anonymize -> LLMChain -> deanonymize`. By doing this, we will retain anonymity in requests to, for example, OpenAI, and then be able restore the original data. - instance anonymization - at this point, each occurrence of PII is treated as a separate entity and separately anonymized. Therefore, two occurrences of the name John Doe in the text will be changed to two different names. It is therefore worth introducing support for full instance detection, so that repeated occurrences are treated as a single object. ### Twitter handle @deepsense_ai / @MaksOpp --------- Co-authored-by: MaksOpp <maks.operlejn@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> 2023-08-30 17:39:44 +00:00			`extended-tests:`
			`runs-on: ubuntu-latest`
			`defaults:`
			`run:`
			`working-directory: ${{ env.WORKDIR }}`
			`strategy:`
			`matrix:`
			`python-version:`
			`- "3.8"`
			`- "3.9"`
			`- "3.10"`
			`- "3.11"`
			`name: Python ${{ matrix.python-version }} extended tests`
			`steps:`
Update to `actions/checkout@v4`. (#11951) We don't use any of the new functionality at the moment. Just making sure we don't fall back on versions and fail to benefit from new patches. This is an easy upgrade and it's always harder to upgrade across multiple major versions at once. 2023-10-23 14:01:33 +00:00			`- uses: actions/checkout@v4`
Add data anonymizer (#9863) ### Description The feature for anonymizing data has been implemented. In order to protect private data, such as when querying external APIs (OpenAI), it is worth pseudonymizing sensitive data to maintain full privacy. Anonynization consists of two steps: 1. Identification: Identify all data fields that contain personally identifiable information (PII). 2. Replacement: Replace all PIIs with pseudo values or codes that do not reveal any personal information about the individual but can be used for reference. We're not using regular encryption, because the language model won't be able to understand the meaning or context of the encrypted data. We use Microsoft Presidio together with Faker framework for anonymization purposes because of the wide range of functionalities they provide. The full implementation is available in `PresidioAnonymizer`. ### Future works - deanonymization - add the ability to reverse anonymization. For example, the workflow could look like this: `anonymize -> LLMChain -> deanonymize`. By doing this, we will retain anonymity in requests to, for example, OpenAI, and then be able restore the original data. - instance anonymization - at this point, each occurrence of PII is treated as a separate entity and separately anonymized. Therefore, two occurrences of the name John Doe in the text will be changed to two different names. It is therefore worth introducing support for full instance detection, so that repeated occurrences are treated as a single object. ### Twitter handle @deepsense_ai / @MaksOpp --------- Co-authored-by: MaksOpp <maks.operlejn@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> 2023-08-30 17:39:44 +00:00
			`- name: Set up Python ${{ matrix.python-version }} + Poetry ${{ env.POETRY_VERSION }}`
			`uses: "./.github/actions/poetry_setup"`
			`with:`
			`python-version: ${{ matrix.python-version }}`
			`poetry-version: ${{ env.POETRY_VERSION }}`
			`working-directory: libs/experimental`
			`cache-key: extended`

			`- name: Install dependencies`
			`shell: bash`
			`run: \|`
			`echo "Running extended tests, installing dependencies with poetry..."`
			`poetry install -E extended_testing`

			`- name: Run extended tests`
			`run: make extended_tests`
Deny creating files as a result of test runs. (#10253) A test file was accidentally dropping a `results.json` file in the current working directory as a result of running `make test`. This is undesirable, since we don't want to risk accidentally adding stray files into the repo if we run tests locally and then do `git add .` without inspecting the file list very closely. 2023-09-06 15:15:16 +00:00
			`- name: Ensure the tests did not create any additional files`
			`shell: bash`
			`run: \|`
			`set -eu`

			`STATUS="$(git status)"`
			`echo "$STATUS"`

			`# grep will exit non-zero if the target message isn't found,`
			# and `set -e` above will cause the step to fail.
			`echo "$STATUS" \| grep 'nothing to commit, working tree clean'`