langchain/docs/modules/indexes/document_loaders/examples
Kevin Huang e4cfaa5680
Introduces SeleniumURLLoader for JavaScript-Dependent Web Page Data Retrieval (#2291)
### Summary
This PR introduces a `SeleniumURLLoader` which, similar to
`UnstructuredURLLoader`, loads data from URLs. However, it utilizes
`selenium` to fetch page content, enabling it to work with
JavaScript-rendered pages. The `unstructured` library is also employed
for loading the HTML content.

### Testing
```bash
pip install selenium
pip install unstructured
```

```python
from langchain.document_loaders import SeleniumURLLoader

urls = [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://goo.gl/maps/NDSHwePEyaHMFGwh8"
]

loader = SeleniumURLLoader(urls=urls)
data = loader.load()
```
2023-04-02 14:05:00 -07:00
..
example_data Harrison/whatsapp loader (#2085) 2023-03-27 23:43:45 -07:00
airbyte_json.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
apify_dataset.ipynb Harrison/apify (#2215) 2023-03-30 20:58:14 -07:00
azlyrics.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
azure_blob_storage_container.ipynb Harrison/site map (#2061) 2023-03-27 16:28:08 -07:00
azure_blob_storage_file.ipynb Harrison/site map (#2061) 2023-03-27 16:28:08 -07:00
bigquery.ipynb Harrison/big query (#2100) 2023-03-28 08:17:22 -07:00
blackboard.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
college_confidential.ipynb Harrison/site map (#2061) 2023-03-27 16:28:08 -07:00
CoNLL-U.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
copypaste.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
csv.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
dataframe.ipynb Harrison/document cleanup (#2062) 2023-03-27 16:32:55 -07:00
directory_loader.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
duckdb.ipynb Harrison/duckdb (#2064) 2023-03-27 19:51:34 -07:00
email.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
epub.ipynb bump version to 128 (#2236) 2023-03-31 11:16:21 -07:00
evernote.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
facebook_chat.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
figma.ipynb [Documents] Updated Figma docs and added example (#2172) 2023-03-29 22:11:45 -07:00
gcs_directory.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
gcs_file.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
gitbook.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
googledrive.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
gutenberg.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
hn.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
html.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
ifixit.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
image.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
imsdb.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
markdown.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
notebook.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
notion.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
notiondb.ipynb feat: Add Notion database document loader (#2056) 2023-03-28 08:07:09 -07:00
obsidian.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
pdf.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
powerpoint.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
readthedocs_documentation.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
roam.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
s3_directory.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
s3_file.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
sitemap.ipynb docs: tiny fix on docs verbiage (#2124) 2023-03-28 22:56:29 -07:00
srt.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
telegram.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
unstructured_file.ipynb feat: document loader for epublications (#2202) 2023-03-30 20:45:31 -07:00
url.ipynb Introduces SeleniumURLLoader for JavaScript-Dependent Web Page Data Retrieval (#2291) 2023-04-02 14:05:00 -07:00
web_base.ipynb Harrison/site map (#2061) 2023-03-27 16:28:08 -07:00
whatsapp_chat.ipynb Harrison/whatsapp loader (#2085) 2023-03-27 23:43:45 -07:00
word_document.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00
youtube.ipynb big docs refactor (#1978) 2023-03-26 19:49:46 -07:00