langchain/libs/experimental/Makefile

.PHONY: all format lint test tests test_watch integration_tests docker_tests help extended_tests

# Default target executed when no arguments are given to make.
all: help

# Define a variable for the test file path.
TEST_FILE ?= tests/unit_tests/

test:
	poetry run pytest $(TEST_FILE)

tests:
	poetry run pytest $(TEST_FILE)

test_watch:
	poetry run ptw --now . -- tests/unit_tests

extended_tests:
	poetry run pytest --only-extended tests/unit_tests

integration_tests:
	poetry run pytest tests/integration_tests


######################
# LINTING AND FORMATTING
######################

# Define a variable for Python and notebook files.
PYTHON_FILES=.
lint format: PYTHON_FILES=.
lint_diff format_diff: PYTHON_FILES=$(shell git diff --relative=libs/experimental --name-only --diff-filter=d master | grep -E '\.py$$|\.ipynb$$')

lint lint_diff:
	poetry run mypy $(PYTHON_FILES)
	poetry run ruff format $(PYTHON_FILES) --diff
	poetry run ruff .

format format_diff:
	poetry run ruff format $(PYTHON_FILES)
	poetry run ruff --select I --fix $(PYTHON_FILES)

spell_check:
	poetry run codespell --toml pyproject.toml

spell_fix:
	poetry run codespell --toml pyproject.toml -w

######################
# HELP
######################

help:
	@echo '----'
	@echo 'format                       - run code formatters'
	@echo 'lint                         - run linters'
	@echo 'test                         - run unit tests'
	@echo 'tests                        - run unit tests'
	@echo 'test TEST_FILE=<test_file>   - run all tests in file'
	@echo 'test_watch                   - run unit tests in watch mode'
Harrison/move experimental (#8084) 2023-07-21 17:36:28 +00:00			`.PHONY: all format lint test tests test_watch integration_tests docker_tests help extended_tests`

			`# Default target executed when no arguments are given to make.`
			`all: help`

			`# Define a variable for the test file path.`
			`TEST_FILE ?= tests/unit_tests/`

			`test:`
			`poetry run pytest $(TEST_FILE)`

Install and use `ruff format` instead of black for code formatting. (#12585) Best to review one commit at a time, since two of the commits are 100% autogenerated changes from running `ruff format`: - Install and use `ruff format` instead of black for code formatting. - Output of `ruff format .` in the `langchain` package. - Use `ruff format` in experimental package. - Format changes in experimental package by `ruff format`. - Manual formatting fixes to make `ruff .` pass. 2023-10-31 14:53:12 +00:00			`tests:`
Harrison/move experimental (#8084) 2023-07-21 17:36:28 +00:00			`poetry run pytest $(TEST_FILE)`

			`test_watch:`
			`poetry run ptw --now . -- tests/unit_tests`

Add data anonymizer (#9863) ### Description The feature for anonymizing data has been implemented. In order to protect private data, such as when querying external APIs (OpenAI), it is worth pseudonymizing sensitive data to maintain full privacy. Anonynization consists of two steps: 1. Identification: Identify all data fields that contain personally identifiable information (PII). 2. Replacement: Replace all PIIs with pseudo values or codes that do not reveal any personal information about the individual but can be used for reference. We're not using regular encryption, because the language model won't be able to understand the meaning or context of the encrypted data. We use Microsoft Presidio together with Faker framework for anonymization purposes because of the wide range of functionalities they provide. The full implementation is available in `PresidioAnonymizer`. ### Future works - deanonymization - add the ability to reverse anonymization. For example, the workflow could look like this: `anonymize -> LLMChain -> deanonymize`. By doing this, we will retain anonymity in requests to, for example, OpenAI, and then be able restore the original data. - instance anonymization - at this point, each occurrence of PII is treated as a separate entity and separately anonymized. Therefore, two occurrences of the name John Doe in the text will be changed to two different names. It is therefore worth introducing support for full instance detection, so that repeated occurrences are treated as a single object. ### Twitter handle @deepsense_ai / @MaksOpp --------- Co-authored-by: MaksOpp <maks.operlejn@gmail.com> Co-authored-by: Bagatur <baskaryan@gmail.com> 2023-08-30 17:39:44 +00:00			`extended_tests:`
			`poetry run pytest --only-extended tests/unit_tests`

General anthropic functions, steps towards experimental integration tests (#11727) To match change in js here https://github.com/langchain-ai/langchainjs/pull/2892 Some integration tests need a bit more work in experimental: ![Screenshot 2023-10-12 at 12 02 49 PM](https://github.com/langchain-ai/langchain/assets/9557659/262d7d22-c405-40e9-afef-669e8d585307) Pretty sure the sqldatabase ones are an actual regression or change in interface because it's returning a placeholder. --------- Co-authored-by: Bagatur <baskaryan@gmail.com> 2023-10-13 16:48:24 +00:00			`integration_tests:`
			`poetry run pytest tests/integration_tests`

Harrison/move experimental (#8084) 2023-07-21 17:36:28 +00:00
			`######################`
			`# LINTING AND FORMATTING`
			`######################`

			`# Define a variable for Python and notebook files.`
			`PYTHON_FILES=.`
			`lint format: PYTHON_FILES=.`
Fix broken `make` targets `format_diff` and `lint_diff` (#8344) Since the refactoring into sub-projects `libs/langchain` and `libs/experimental`, the `make` targets `format_diff` and `lint_diff` do not work anymore when running `make` from these subdirectories. Reason is that ``` PYTHON_FILES=$(shell git diff --name-only --diff-filter=d master \| grep -E '\.py$$\|\.ipynb$$') ``` generates paths from the project's root directory instead of the corresponding subdirectories. This PR fixes this by adding a `--relative` command line option. - Tag maintainer: @baskaryan 2023-07-27 08:56:55 +00:00			`lint_diff format_diff: PYTHON_FILES=$(shell git diff --relative=libs/experimental --name-only --diff-filter=d master \| grep -E '\.py$$\|\.ipynb$$')`
Harrison/move experimental (#8084) 2023-07-21 17:36:28 +00:00
			`lint lint_diff:`
			`poetry run mypy $(PYTHON_FILES)`
Install and use `ruff format` instead of black for code formatting. (#12585) Best to review one commit at a time, since two of the commits are 100% autogenerated changes from running `ruff format`: - Install and use `ruff format` instead of black for code formatting. - Output of `ruff format .` in the `langchain` package. - Use `ruff format` in experimental package. - Format changes in experimental package by `ruff format`. - Manual formatting fixes to make `ruff .` pass. 2023-10-31 14:53:12 +00:00			`poetry run ruff format $(PYTHON_FILES) --diff`
Harrison/move experimental (#8084) 2023-07-21 17:36:28 +00:00			`poetry run ruff .`

			`format format_diff:`
Install and use `ruff format` instead of black for code formatting. (#12585) Best to review one commit at a time, since two of the commits are 100% autogenerated changes from running `ruff format`: - Install and use `ruff format` instead of black for code formatting. - Output of `ruff format .` in the `langchain` package. - Use `ruff format` in experimental package. - Format changes in experimental package by `ruff format`. - Manual formatting fixes to make `ruff .` pass. 2023-10-31 14:53:12 +00:00			`poetry run ruff format $(PYTHON_FILES)`
Harrison/move experimental (#8084) 2023-07-21 17:36:28 +00:00			`poetry run ruff --select I --fix $(PYTHON_FILES)`

			`spell_check:`
			`poetry run codespell --toml pyproject.toml`

			`spell_fix:`
			`poetry run codespell --toml pyproject.toml -w`

			`######################`
			`# HELP`
			`######################`

			`help:`
			`@echo '----'`
			`@echo 'format - run code formatters'`
			`@echo 'lint - run linters'`
			`@echo 'test - run unit tests'`
			`@echo 'tests - run unit tests'`
			`@echo 'test TEST_FILE=<test_file> - run all tests in file'`
			`@echo 'test_watch - run unit tests in watch mode'`