You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
langchain/libs/community/tests/unit_tests
Peter Vandenabeele e830a4e731
community[patch]: Add remove_comments option (default True): do not extract html comments (#13259)
- **Description:** add `remove_comments` option (default: True): do not
extract html _comments_,
  - **Issue:** None,
  - **Dependencies:** None,
  - **Tag maintainer:** @nfcampos ,
  - **Twitter handle:** peter_v

I ran `make format`, `make lint` and `make test`.

Discussion: I my use case, I prefer to not have the comments in the
extracted text:
* e.g. from a Google tag that is added in the html as comment
* e.g. content that the authors have temporarily hidden to make it non
visible to the regular reader

Removing the comments makes the extracted text more alike the intended
text to be seen by the reader.


**Choice to make:** do we prefer to make the default for this
`remove_comments` option to be True or False?
I have changed it to True in a second commit, since that is how I would
prefer to use it by default. Have the
cleaned text (without technical Google tags etc.) and also closer to the
actually visible and intended content.
I am not sure what is best aligned with the conventions of langchain in
general ...


INITIAL VERSION (new version above):
~**Choice to make:** do we prefer to make the default for this
`ignore_comments` option to be True or False?
I have set it to False now to be backwards compatible. On the other
hand, I would use it mostly with True.
I am not sure what is best aligned with the conventions of langchain in
general ...~

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
6 months ago
..
agent_toolkits community[minor]: Update Azure Cognitive Services to Azure AI Services (#19488) 6 months ago
callbacks community[minor] : adds callback handler for Fiddler AI (#17708) 7 months ago
chat_loaders Restore self message sent before OSX 12 Monterey (#14818) 9 months ago
chat_message_histories community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 10 months ago
chat_models community[minor]: Update ChatZhipuAI to support GLM-4 model (#16695) 6 months ago
docstore community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 10 months ago
document_loaders community[minor]: add support for llmsherpa (#19741) 6 months ago
document_transformers community[patch]: Add remove_comments option (default True): do not extract html comments (#13259) 6 months ago
embeddings community[patch]: OllamaEmbeddings - Pass headers to post request (#16880) 6 months ago
examples community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 10 months ago
graphs comunity[patch]: Fix neo4j sanitizing values (#18750) 7 months ago
indexes community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 10 months ago
llms community[patch]: introduce convert_to_secret() to bananadev llm (#14283) 6 months ago
retrievers community[minor]: Add Dria retriever (#17098) 6 months ago
storage add mongodb_store (#13801) 8 months ago
tools community[minor]: add hugging face text-to-speech inference API (#18880) 6 months ago
utilities community[minor]: Add Dria retriever (#17098) 6 months ago
utils community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 10 months ago
vectorstores community[minor]: Pathway vectorstore(#14859) 6 months ago
__init__.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 10 months ago
conftest.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 10 months ago
test_dependencies.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 10 months ago
test_imports.py infra: Fix test filesystem paths incompatible with windows (#14388) 10 months ago
test_sql_database.py community[minor]: Add lazy_table_reflection param to SqlDatabase (#18742) 7 months ago
test_sql_database_schema.py infra: mv SQLDatabase tests to community (#17276) 8 months ago
test_sqlalchemy.py community[major], core[patch], langchain[patch], experimental[patch]: Create langchain-community (#14463) 10 months ago