You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/docs/modules/indexes
Kevin Huang e4cfaa5680
Introduces SeleniumURLLoader for JavaScript-Dependent Web Page Data Retrieval (#2291)
### Summary
This PR introduces a `SeleniumURLLoader` which, similar to
`UnstructuredURLLoader`, loads data from URLs. However, it utilizes
`selenium` to fetch page content, enabling it to work with
JavaScript-rendered pages. The `unstructured` library is also employed
for loading the HTML content.

### Testing
```bash
pip install selenium
pip install unstructured
```

```python
from langchain.document_loaders import SeleniumURLLoader

urls = [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://goo.gl/maps/NDSHwePEyaHMFGwh8"
]

loader = SeleniumURLLoader(urls=urls)
data = loader.load()
```
2 years ago
..
document_loaders/examples Introduces SeleniumURLLoader for JavaScript-Dependent Web Page Data Retrieval (#2291) 2 years ago
retrievers/examples Fix typo in documentation: vectorstore-retriever.ipynb (#2306) 2 years ago
text_splitters Update huggingface_length_function.ipynb (#2203) 2 years ago
vectorstores Add Zilliz example (#2288) 2 years ago
document_loaders.rst big docs refactor (#1978) 2 years ago
getting_started.ipynb big docs refactor (#1978) 2 years ago
retrievers.rst big docs refactor (#1978) 2 years ago
text_splitters.rst big docs refactor (#1978) 2 years ago
vectorstores.rst big docs refactor (#1978) 2 years ago