langchain/libs/community
Peter Vandenabeele e830a4e731
community[patch]: Add remove_comments option (default True): do not extract html comments (#13259)
- **Description:** add `remove_comments` option (default: True): do not
extract html _comments_,
  - **Issue:** None,
  - **Dependencies:** None,
  - **Tag maintainer:** @nfcampos ,
  - **Twitter handle:** peter_v

I ran `make format`, `make lint` and `make test`.

Discussion: I my use case, I prefer to not have the comments in the
extracted text:
* e.g. from a Google tag that is added in the html as comment
* e.g. content that the authors have temporarily hidden to make it non
visible to the regular reader

Removing the comments makes the extracted text more alike the intended
text to be seen by the reader.


**Choice to make:** do we prefer to make the default for this
`remove_comments` option to be True or False?
I have changed it to True in a second commit, since that is how I would
prefer to use it by default. Have the
cleaned text (without technical Google tags etc.) and also closer to the
actually visible and intended content.
I am not sure what is best aligned with the conventions of langchain in
general ...


INITIAL VERSION (new version above):
~**Choice to make:** do we prefer to make the default for this
`ignore_comments` option to be True or False?
I have set it to False now to be backwards compatible. On the other
hand, I would use it mostly with True.
I am not sure what is best aligned with the conventions of langchain in
general ...~

---------

Co-authored-by: Bagatur <baskaryan@gmail.com>
2024-04-02 00:19:12 +00:00
..
langchain_community community[patch]: Add remove_comments option (default True): do not extract html comments (#13259) 2024-04-02 00:19:12 +00:00
scripts infra: add print rule to ruff (#16221) 2024-02-09 16:13:30 -08:00
tests community[patch]: Add remove_comments option (default True): do not extract html comments (#13259) 2024-04-02 00:19:12 +00:00
Makefile community[minor]: add Kinetica LLM wrapper (#17879) 2024-02-22 16:02:00 -08:00
poetry.lock community[minor]: Update ChatZhipuAI to support GLM-4 model (#16695) 2024-04-01 18:11:21 +00:00
pyproject.toml community[minor]: Update ChatZhipuAI to support GLM-4 model (#16695) 2024-04-01 18:11:21 +00:00
README.md Batch update of alt text and title attributes for images in md/mdx files across repo (#15357) 2024-01-12 14:37:48 -08:00

🦜🧑‍🤝‍🧑 LangChain Community

Downloads License: MIT

Quick Install

pip install langchain-community

What is it?

LangChain Community contains third-party integrations that implement the base interfaces defined in LangChain Core, making them ready-to-use in any LangChain application.

For full documentation see the API reference.

Diagram outlining the hierarchical organization of the LangChain framework, displaying the interconnected parts across multiple layers.

📕 Releases & Versioning

langchain-community is currently on version 0.0.x

All changes will be accompanied by a patch version increase.

💁 Contributing

As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.

For detailed information on how to contribute, see the Contributing Guide.