langchain/tests/unit_tests/document_loaders/test_web_base.py

11 lines
473 B
Python
Raw Normal View History

Respect User-Specified User-Agent in WebBaseLoader (#4579) # Respect User-Specified User-Agent in WebBaseLoader This pull request modifies the `WebBaseLoader` class initializer from the `langchain.document_loaders.web_base` module to preserve any User-Agent specified by the user in the `header_template` parameter. Previously, even if a User-Agent was specified in `header_template`, it would always be overridden by a random User-Agent generated by the `fake_useragent` library. With this change, if a User-Agent is specified in `header_template`, it will be used. Only in the case where no User-Agent is specified will a random User-Agent be generated and used. This provides additional flexibility when using the `WebBaseLoader` class, allowing users to specify their own User-Agent if they have a specific need or preference, while still providing a reasonable default for cases where no User-Agent is specified. This change has no impact on existing users who do not specify a User-Agent, as the behavior in this case remains the same. However, for users who do specify a User-Agent, their choice will now be respected and used for all subsequent requests made using the `WebBaseLoader` class. Fixes #4167 ## Before submitting ============================= test session starts ============================== collecting ... collected 1 item test_web_base.py::TestWebBaseLoader::test_respect_user_specified_user_agent ============================== 1 passed in 3.64s =============================== PASSED [100%] ## Who can review? Community members can review the PR once tests pass. Tag maintainers/contributors who might be interested: @eyurtsev --------- Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2023-05-15 03:09:27 +00:00
from langchain.document_loaders.web_base import WebBaseLoader
class TestWebBaseLoader:
def test_respect_user_specified_user_agent(self) -> None:
user_specified_user_agent = "user_specified_user_agent"
header_template = {"User-Agent": user_specified_user_agent}
url = "https://www.example.com"
loader = WebBaseLoader(url, header_template=header_template)
assert loader.session.headers["User-Agent"] == user_specified_user_agent