langchain

mirror of https://github.com/hwchase17/langchain synced 2024-10-29 17:07:25 +00:00

History

Kevin Huang e4cfaa5680 Introduces SeleniumURLLoader for JavaScript-Dependent Web Page Data Retrieval (#2291 ) ### Summary This PR introduces a `SeleniumURLLoader` which, similar to `UnstructuredURLLoader`, loads data from URLs. However, it utilizes `selenium` to fetch page content, enabling it to work with JavaScript-rendered pages. The `unstructured` library is also employed for loading the HTML content. ### Testing ```bash pip install selenium pip install unstructured ``` ```python from langchain.document_loaders import SeleniumURLLoader urls = [ "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "https://goo.gl/maps/NDSHwePEyaHMFGwh8" ] loader = SeleniumURLLoader(urls=urls) data = loader.load() ```	2023-04-02 14:05:00 -07:00
..
examples	Introduces SeleniumURLLoader for JavaScript-Dependent Web Page Data Retrieval (#2291 )	2023-04-02 14:05:00 -07:00

Introduces SeleniumURLLoader for JavaScript-Dependent Web Page Data Retrieval (#2291 )

### Summary
This PR introduces a `SeleniumURLLoader` which, similar to
`UnstructuredURLLoader`, loads data from URLs. However, it utilizes
`selenium` to fetch page content, enabling it to work with
JavaScript-rendered pages. The `unstructured` library is also employed
for loading the HTML content.

### Testing
```bash
pip install selenium
pip install unstructured
```

```python
from langchain.document_loaders import SeleniumURLLoader

urls = [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://goo.gl/maps/NDSHwePEyaHMFGwh8"
]

loader = SeleniumURLLoader(urls=urls)
data = loader.load()
```

2023-04-02 14:05:00 -07:00

examples

Introduces SeleniumURLLoader for JavaScript-Dependent Web Page Data Retrieval (#2291 )

2023-04-02 14:05:00 -07:00