langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-10 01:10:59 +00:00

History

Peter Vandenabeele e830a4e731 community[patch]: Add remove_comments option (default True): do not extract html comments (#13259 ) - Description: add `remove_comments` option (default: True): do not extract html _comments_, - Issue: None, - Dependencies: None, - Tag maintainer: @nfcampos , - Twitter handle: peter_v I ran `make format`, `make lint` and `make test`. Discussion: I my use case, I prefer to not have the comments in the extracted text: * e.g. from a Google tag that is added in the html as comment * e.g. content that the authors have temporarily hidden to make it non visible to the regular reader Removing the comments makes the extracted text more alike the intended text to be seen by the reader. Choice to make: do we prefer to make the default for this `remove_comments` option to be True or False? I have changed it to True in a second commit, since that is how I would prefer to use it by default. Have the cleaned text (without technical Google tags etc.) and also closer to the actually visible and intended content. I am not sure what is best aligned with the conventions of langchain in general ... INITIAL VERSION (new version above): ~Choice to make: do we prefer to make the default for this `ignore_comments` option to be True or False? I have set it to False now to be backwards compatible. On the other hand, I would use it mostly with True. I am not sure what is best aligned with the conventions of langchain in general ...~ --------- Co-authored-by: Bagatur <baskaryan@gmail.com>		2024-04-02 00:19:12 +00:00
..
adapters
agent_toolkits	community[patch]: avoid executing `toolkit.get_context()` when not necessary (#19762 )	2024-03-29 16:42:21 +00:00
callbacks	community[patch] : [Fiddler] ensure dataset is not added if model is present (#19293 )	2024-03-25 17:28:05 -07:00
chat_loaders	community[patch]: speed up import times in the community package (#18928 )	2024-03-11 16:37:36 -04:00
chat_message_histories	community[patch]: history size support for DynamoDBChatMessageHistory (#16794 )	2024-03-29 18:56:21 +00:00
chat_models	community[minor]: Update ChatZhipuAI to support GLM-4 model (#16695 )	2024-04-01 18:11:21 +00:00
cross_encoders	langchain[minor], community[minor]: add CrossEncoderReranker with HuggingFaceCrossEncoder and SagemakerEndpointCrossEncoder (#13687 )	2024-03-31 20:51:31 +00:00
docstore	community[patch]: speed up import times in the community package (#18928 )	2024-03-11 16:37:36 -04:00
document_compressors	community[minor]: Add OpenVINO rerank model support (#19791 )	2024-04-01 18:27:23 +00:00
document_loaders	community[minor]: add support for llmsherpa (#19741 )	2024-03-29 16:04:57 -07:00
document_transformers	community[patch]: Add remove_comments option (default True): do not extract html comments (#13259 )	2024-04-02 00:19:12 +00:00
embeddings	community[minor]: Add OpenVINO rerank model support (#19791 )	2024-04-01 18:27:23 +00:00
example_selectors
graphs	community[minor]: Add the option to omit schema refresh in Neo4jGraph (#19654 )	2024-03-27 14:20:12 -04:00
indexes
llms	community[minor]: add Layerup Security integration (#19787 )	2024-04-01 23:49:00 +00:00
output_parsers
retrievers	community[minor]: Add Dria retriever (#17098 )	2024-04-01 12:04:19 -07:00
storage	community[patch]: flattening imports 3 (#18939 )	2024-03-12 15:18:54 -07:00
tools	community[minor]: add hugging face text-to-speech inference API (#18880 )	2024-03-29 15:02:29 +00:00
utilities	community[minor]: Add Dria retriever (#17098 )	2024-04-01 12:04:19 -07:00
utils	community[patch], mongodb[patch]: Stop spamming SIMD import warnings (#19531 )	2024-03-28 03:11:02 +00:00
vectorstores	community[patch]: Revert " Fix the bug that Chroma does not specify `e… (#19866 )	2024-04-01 10:10:44 -07:00
__init__.py
cache.py	core[patch]: fix beta, deprecated typing (#18877 )	2024-03-28 22:33:43 +00:00
py.typed