German Swan
d4dc98a9f9
community[patch]: RecursiveUrlLoader: add base_url option ( #19421 )
...
RecursiveUrlLoader does not currently provide an option to set
`base_url` other than the `url`, though it uses a function with such an
option.
For example, this causes it unable to parse the
`https://python.langchain.com/docs `, as it returns the 404 page, and
`https://python.langchain.com/docs/get_started/introduction ` has no
child routes to parse.
`base_url` allows setting the `https://python.langchain.com/docs ` to
filter by, while the starting URL is anything inside, that contains
relevant links to continue crawling.
I understand that for this case, the docusaurus loader could be used,
but it's a common issue with many websites.
---------
Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-03-22 15:34:31 -07:00
Hemslo Wang
58a2abf089
community[patch]: fix RecursiveUrlLoader metadata_extractor return type ( #18193 )
...
**Description:** Fix `metadata_extractor` type for `RecursiveUrlLoader`,
the default `_metadata_extractor` returns `dict` instead of `str`.
**Issue:** N/A
**Dependencies:** N/A
**Twitter handle:** N/A
Signed-off-by: Hemslo Wang <hemslo.wang@gmail.com>
2024-03-01 12:08:20 -08:00
Christophe Bornet
177f51c7bd
community: Use default load() implementation in doc loaders ( #18385 )
...
Following https://github.com/langchain-ai/langchain/pull/18289
2024-03-01 14:46:52 -05:00
Robby
ece4b43a81
community[patch]: doc loaders mypy fixes ( #17368 )
...
**Description:** Fixed `type: ignore`'s for mypy for some
document_loaders.
**Issue:** [Remove "type: ignore" comments #17048
](https://github.com/langchain-ai/langchain/issues/17048 )
---------
Co-authored-by: Robby <h0rv@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-02-12 16:51:06 -08:00
Bagatur
af74301ab9
core[patch], community[patch]: link extraction continue on failure ( #17200 )
2024-02-07 14:15:30 -08:00
Bagatur
ed58eeb9c5
community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community ( #14463 )
...
Moved the following modules to new package langchain-community in a backwards compatible fashion:
```
mv langchain/langchain/adapters community/langchain_community
mv langchain/langchain/callbacks community/langchain_community/callbacks
mv langchain/langchain/chat_loaders community/langchain_community
mv langchain/langchain/chat_models community/langchain_community
mv langchain/langchain/document_loaders community/langchain_community
mv langchain/langchain/docstore community/langchain_community
mv langchain/langchain/document_transformers community/langchain_community
mv langchain/langchain/embeddings community/langchain_community
mv langchain/langchain/graphs community/langchain_community
mv langchain/langchain/llms community/langchain_community
mv langchain/langchain/memory/chat_message_histories community/langchain_community
mv langchain/langchain/retrievers community/langchain_community
mv langchain/langchain/storage community/langchain_community
mv langchain/langchain/tools community/langchain_community
mv langchain/langchain/utilities community/langchain_community
mv langchain/langchain/vectorstores community/langchain_community
mv langchain/langchain/agents/agent_toolkits community/langchain_community
mv langchain/langchain/cache.py community/langchain_community
mv langchain/langchain/adapters community/langchain_community
mv langchain/langchain/callbacks community/langchain_community/callbacks
mv langchain/langchain/chat_loaders community/langchain_community
mv langchain/langchain/chat_models community/langchain_community
mv langchain/langchain/document_loaders community/langchain_community
mv langchain/langchain/docstore community/langchain_community
mv langchain/langchain/document_transformers community/langchain_community
mv langchain/langchain/embeddings community/langchain_community
mv langchain/langchain/graphs community/langchain_community
mv langchain/langchain/llms community/langchain_community
mv langchain/langchain/memory/chat_message_histories community/langchain_community
mv langchain/langchain/retrievers community/langchain_community
mv langchain/langchain/storage community/langchain_community
mv langchain/langchain/tools community/langchain_community
mv langchain/langchain/utilities community/langchain_community
mv langchain/langchain/vectorstores community/langchain_community
mv langchain/langchain/agents/agent_toolkits community/langchain_community
mv langchain/langchain/cache.py community/langchain_community
```
Moved the following to core
```
mv langchain/langchain/utils/json_schema.py core/langchain_core/utils
mv langchain/langchain/utils/html.py core/langchain_core/utils
mv langchain/langchain/utils/strings.py core/langchain_core/utils
cat langchain/langchain/utils/env.py >> core/langchain_core/utils/env.py
rm langchain/langchain/utils/env.py
```
See .scripts/community_split/script_integrations.sh for all changes
2023-12-11 13:53:30 -08:00