You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/tests/integration_tests/document_loaders
Luke Harris b4de839ed8
Several confluence loader improvements (#3300)
This PR addresses several improvements:

- Previously it was not possible to load spaces of more than 100 pages.
The `limit` was being used both as an overall page limit *and* as a per
request pagination limit. This, in combination with the fact that
atlassian seem to use a server-side hard limit of 100 when page content
is expanded, meant it wasn't possible to download >100 pages. Now
`limit` is used *only* as a per-request pagination limit and `max_pages`
is introduced as the way to limit the total number of pages returned by
the paginator.
- Document metadata now includes `source` (the source url), making it
compatible with `RetrievalQAWithSourcesChain`.
 - It is now possible to include inline and footer comments.
- It is now possible to pass `verify_ssl=False` and other parameters to
the confluence object for use cases that require it.
1 year ago
..
__init__.py
test_bigquery.py
test_bilibili.py Added bilibili loader (#2673) (#2724) 1 year ago
test_bshtml.py Add ability to pass kwargs to loader classes in `DirectoryLoader`, add ability to modify encoding and BeautifulSoup behaviour in `BSHTMLLoader` (#2275) 2 years ago
test_confluence.py Several confluence loader improvements (#3300) 1 year ago
test_dataframe.py rm pandas dependency (#2102) 2 years ago
test_duckdb.py
test_email.py Harrison/msg files (#2375) 2 years ago
test_figma.py
test_gitbook.py Harrison/gitbook (#2044) 2 years ago
test_ifixit.py
test_pdf.py Add new loader to load pdf as html content (#2607) 2 years ago
test_python.py Add PythonLoader which auto-detects encoding of Python files (#3311) 1 year ago
test_sitemap.py
test_slack.py Add Slack Directory Loader (#2841) 1 year ago
test_url.py add continue to fix 'continue_on_failure' parameter for URL doc loader (#2735) 1 year ago
test_url_playwright.py Harrison/playwright selector (#3185) 1 year ago