langchain/tests/unit_tests
Kenton Parton 9124221d31
Fixed handling of absolute URLs in RecursiveUrlLoader (#7677)
<!-- Thank you for contributing to LangChain!

Replace this comment with:
  - Description:
  - Issue: the issue # it fixes (if applicable),
  - Dependencies: any dependencies required for this change,
- Tag maintainer: for a quicker response, tag the relevant maintainer
(see below),
- Twitter handle: we announce bigger features on Twitter. If your PR
gets announced and you'd like a mention, we'll gladly shout you out!

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
  2. an example notebook showing its use.

Maintainer responsibilities:
  - General / Misc / if you don't know who to tag: @baskaryan
  - DataLoaders / VectorStores / Retrievers: @rlancemartin, @eyurtsev
  - Models / Prompts: @hwchase17, @baskaryan
  - Memory: @hwchase17
  - Agents / Tools / Toolkits: @hinthornw
  - Tracing / Callbacks: @agola11
  - Async: @agola11

If no one reviews your PR within a few days, feel free to @-mention the
same people again.

See contribution guidelines for more information on how to write/run
tests, lint, etc:
https://github.com/hwchase17/langchain/blob/master/.github/CONTRIBUTING.md
 -->

## Description
This PR addresses a bug in the RecursiveUrlLoader class where absolute
URLs were being treated as relative URLs, causing malformed URLs to be
produced. The fix involves using the urljoin function from the
urllib.parse module to correctly handle both absolute and relative URLs.

@rlancemartin @eyurtsev

---------

Co-authored-by: Lance Martin <lance@langchain.dev>
2023-07-13 15:34:00 -07:00
..
agents codespell: workflow, config + some (quite a few) typos fixed (#6785) 2023-07-12 16:20:08 -04:00
callbacks [Breaking] Update Evaluation Functionality (#7388) 2023-07-13 02:13:06 -07:00
chains Update the parser regex of map_rerank (#6419) 2023-07-13 03:01:42 -04:00
chat_models Harrison/split schema dir (#7025) 2023-07-01 13:39:19 -04:00
data
docstore Enable InMemoryDocstore to be constructed without providing a dict (#6976) 2023-07-05 16:56:31 -04:00
document_loaders Fixed handling of absolute URLs in RecursiveUrlLoader (#7677) 2023-07-13 15:34:00 -07:00
evaluation Normalize Trajectory Eval Score (#7668) 2023-07-13 09:58:28 -07:00
examples codespell: workflow, config + some (quite a few) typos fixed (#6785) 2023-07-12 16:20:08 -04:00
llms Harrison/split schema dir (#7025) 2023-07-01 13:39:19 -04:00
load Include placeholder value for all secrets, not just kwargs (#6421) 2023-06-19 15:41:45 +01:00
memory Add ZepMemory; improve ZepChatMessageHistory handling of metadata; Fix bugs (#7444) 2023-07-10 01:53:49 -04:00
output_parsers Re-use Trajectory Evaluator (#7248) 2023-07-06 07:00:24 -07:00
prompts Jinja2 validation changed to issue warnings rather than issuing exceptions. (#7161) 2023-07-05 14:04:29 -04:00
retrievers Add serialized object to retriever start callback (#7074) 2023-07-05 18:04:43 +01:00
smith [Breaking] Update Evaluation Functionality (#7388) 2023-07-13 02:13:06 -07:00
tools codespell: workflow, config + some (quite a few) typos fixed (#6785) 2023-07-12 16:20:08 -04:00
utilities Fix graphql tool (#4984) 2023-05-19 15:27:50 -07:00
vectorstores Add maximal relevance search to SKLearnVectorStore (#5430) 2023-05-30 16:13:33 -07:00
__init__.py
conftest.py
test_bash.py Add Mastodon toots loader (#5036) 2023-05-22 16:43:07 -07:00
test_cache.py Fix SQLAlchemy LLM cache clear (#7653) 2023-07-13 09:39:04 -04:00
test_dependencies.py [Breaking] Update Evaluation Functionality (#7388) 2023-07-13 02:13:06 -07:00
test_document_transformers.py Add new types of document transformers (#7379) 2023-07-12 23:53:30 -04:00
test_formatting.py
test_math_utils.py add get_top_k_cosine_similarity method to get max top k score and index (#5059) 2023-05-22 11:55:48 -07:00
test_pytest_config.py Block sockets for unit-tests (#4803) 2023-05-16 14:41:24 -04:00
test_python.py
test_schema.py Harrison/split schema dir (#7025) 2023-07-01 13:39:19 -04:00
test_sql_database_schema.py
test_sql_database.py Fix SQLAlchemy truncating text when it is too big (#5206) 2023-06-01 21:33:31 -04:00
test_sqlalchemy.py unit test sqlalachemy (#7582) 2023-07-12 03:03:16 -04:00
test_text_splitter.py Fix inconsistent behavior of CharacterTextSplitter when changing keep_separator (#7263) 2023-07-06 09:30:03 -04:00
test_utils.py Refac package version check (#7312) 2023-07-07 01:21:53 -04:00