3b6206af49
# Respect User-Specified User-Agent in WebBaseLoader This pull request modifies the `WebBaseLoader` class initializer from the `langchain.document_loaders.web_base` module to preserve any User-Agent specified by the user in the `header_template` parameter. Previously, even if a User-Agent was specified in `header_template`, it would always be overridden by a random User-Agent generated by the `fake_useragent` library. With this change, if a User-Agent is specified in `header_template`, it will be used. Only in the case where no User-Agent is specified will a random User-Agent be generated and used. This provides additional flexibility when using the `WebBaseLoader` class, allowing users to specify their own User-Agent if they have a specific need or preference, while still providing a reasonable default for cases where no User-Agent is specified. This change has no impact on existing users who do not specify a User-Agent, as the behavior in this case remains the same. However, for users who do specify a User-Agent, their choice will now be respected and used for all subsequent requests made using the `WebBaseLoader` class. Fixes #4167 ## Before submitting ============================= test session starts ============================== collecting ... collected 1 item test_web_base.py::TestWebBaseLoader::test_respect_user_specified_user_agent ============================== 1 passed in 3.64s =============================== PASSED [100%] ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @eyurtsev --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com> |
||
---|---|---|
.. | ||
integration_tests | ||
mock_servers | ||
unit_tests | ||
__init__.py | ||
data.py | ||
README.md |
Readme tests(draft)
Integrations Tests
Prepare
This repository contains functional tests for several search engines and databases. The tests aim to verify the correct behavior of the engines and databases according to their specifications and requirements.
To run some integration tests, such as tests located in
tests/integration_tests/vectorstores/
, you will need to install the following
software:
- Docker
- Python 3.8.1 or later
We have optional group test_integration
in the pyproject.toml
file. This group
should contain dependencies for the integration tests and can be installed using the
command:
poetry install --with test_integration
Any new dependencies should be added by running:
# add package and install it after adding:
poetry add tiktoken@latest --group "test_integration" && poetry install --with test_integration
Before running any tests, you should start a specific Docker container that has all the
necessary dependencies installed. For instance, we use the elasticsearch.yml
container
for test_elasticsearch.py
:
cd tests/integration_tests/vectorstores/docker-compose
docker-compose -f elasticsearch.yml up
Prepare environment variables for local testing:
- copy
tests/.env.example
totests/.env
- set variables in
tests/.env
file, e.gOPENAI_API_KEY
Additionally, it's important to note that some integration tests may require certain
environment variables to be set, such as OPENAI_API_KEY
. Be sure to set any required
environment variables before running the tests to ensure they run correctly.
Recording HTTP interactions with pytest-vcr
Some of the integration tests in this repository involve making HTTP requests to external services. To prevent these requests from being made every time the tests are run, we use pytest-vcr to record and replay HTTP interactions.
When running tests in a CI/CD pipeline, you may not want to modify the existing cassettes. You can use the --vcr-record=none command-line option to disable recording new cassettes. Here's an example:
pytest --log-cli-level=10 tests/integration_tests/vectorstores/test_pinecone.py --vcr-record=none
pytest tests/integration_tests/vectorstores/test_elasticsearch.py --vcr-record=none
Run some tests with coverage:
pytest tests/integration_tests/vectorstores/test_elasticsearch.py --cov=langchain --cov-report=html
start "" htmlcov/index.html || open htmlcov/index.html