Web Loader: Add proxy support (#6792)

Proxies are helpful, especially when you start querying against more
anti-bot websites.

[Proxy
services](https://developers.oxylabs.io/advanced-proxy-solutions/web-unblocker/making-requests)
(of which there are many) and `requests` make it easy to rotate IPs to
prevent banning by just passing along a simple dict to `requests`.

CC @rlancemartin, @eyurtsev
pull/6030/head
Tim Asp 1 year ago committed by GitHub
parent f92ccf70fd
commit 3ca1a387c2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -224,13 +224,33 @@
"docs"
]
},
{
"cell_type": "markdown",
"source": [
"## Using proxies\n",
"\n",
"Sometimes you might need to use proxies to get around IP blocks. You can pass in a dictionary of proxies to the loader (and `requests` underneath) to use them."
],
"metadata": {
"collapsed": false
}
},
{
"cell_type": "code",
"execution_count": null,
"id": "1dd8ab23",
"metadata": {},
"outputs": [],
"source": []
"source": [
"loader = WebBaseLoader(\n",
" \"https://www.walmart.com/search?q=parrots\", proxies={\n",
" \"http\": \"http://{username}:{password}:@proxy.service.com:6666/\",\n",
" \"https\": \"https://{username}:{password}:@proxy.service.com:6666/\"\n",
" }\n",
")\n",
"docs = loader.load()\n"
],
"metadata": {
"collapsed": false
}
}
],
"metadata": {

@ -61,6 +61,7 @@ class WebBaseLoader(BaseLoader):
web_path: Union[str, List[str]],
header_template: Optional[dict] = None,
verify: Optional[bool] = True,
proxies: Optional[dict] = None,
):
"""Initialize with webpage path."""
@ -97,6 +98,9 @@ class WebBaseLoader(BaseLoader):
)
self.session.headers = dict(headers)
if proxies:
self.session.proxies.update(proxies)
@property
def web_path(self) -> str:
if len(self.web_paths) > 1:

Loading…
Cancel
Save