Commit Graph

3 Commits

Author SHA1 Message Date
Harrison Chase
07d7096de6
Harrison/playwright (#2871)
Co-authored-by: Manuel Saelices <msaelices@gmail.com>
2023-04-13 22:15:03 -07:00
Kevin Huang
e4cfaa5680
Introduces SeleniumURLLoader for JavaScript-Dependent Web Page Data Retrieval (#2291)
### Summary
This PR introduces a `SeleniumURLLoader` which, similar to
`UnstructuredURLLoader`, loads data from URLs. However, it utilizes
`selenium` to fetch page content, enabling it to work with
JavaScript-rendered pages. The `unstructured` library is also employed
for loading the HTML content.

### Testing
```bash
pip install selenium
pip install unstructured
```

```python
from langchain.document_loaders import SeleniumURLLoader

urls = [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://goo.gl/maps/NDSHwePEyaHMFGwh8"
]

loader = SeleniumURLLoader(urls=urls)
data = loader.load()
```
2023-04-02 14:05:00 -07:00
Harrison Chase
705431aecc
big docs refactor (#1978)
Co-authored-by: Ankush Gola <ankush.gola@gmail.com>
2023-03-26 19:49:46 -07:00