Fix encoding issue in WebBaseLoader (#3602)

The character code mismatches occurred when character information was
not included in the response header (In my case, a Japanese web page).
I solved this issue by changing the encoding setting to
apparent_encoding.
fix_agent_callbacks
Kohei Kumazaki 1 year ago committed by GitHub
parent be7a8e0824
commit fa4c35e9e5
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -169,6 +169,7 @@ class WebBaseLoader(BaseLoader):
self._check_parser(parser)
html_doc = self.session.get(url)
html_doc.encoding = html_doc.apparent_encoding
return BeautifulSoup(html_doc.text, parser)
def scrape(self, parser: Union[str, None] = None) -> Any:

Loading…
Cancel
Save