Strip sitemap entries (#2132)

Loading this sitemap didn't work for me
https://www.alzallies.com/sitemap.xml

Changing this fixed it and it seems like a good idea to do it in
general.

Integration tests pass
This commit is contained in:
Sebastien Kerbrat 2023-03-28 22:56:07 -07:00 committed by GitHub
parent 27f80784d0
commit 4ab66c4f52
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -58,7 +58,7 @@ class SitemapLoader(WebBaseLoader):
els = self.parse_sitemap(soup)
results = self.scrape_all([el["loc"] for el in els if "loc" in el])
results = self.scrape_all([el["loc"].strip() for el in els if "loc" in el])
return [
Document(