langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-10 01:10:59 +00:00

History

Lei Zhang 748a6ae609 community[patch]: add HTTP response headers Content-Type to metadata of RecursiveUrlLoader document (#20875 ) Description: The RecursiveUrlLoader loader offers a link_regex parameter that can filter out URLs. However, this filtering capability is limited, and if the internal links of the website change, unexpected resources may be loaded. These resources, such as font files, can cause problems in subsequent embedding processing. > https://blog.langchain.dev/assets/fonts/source-sans-pro-v21-latin-ext_latin-regular.woff2?v=0312715cbf We can add the Content-Type in the HTTP response headers to the document metadata so developers can choose which resources to use. This allows developers to make their own choices. For example, the following may be a good choice for text knowledge. - text/plain - simple text file - text/html - HTML web page - text/xml - XML format file - text/json - JSON format data - application/pdf - PDF file - application/msword - Word document and ignore the following - text/css - CSS stylesheet - text/javascript - JavaScript script - application/octet-stream - binary data - image/jpeg - JPEG image - image/png - PNG image - image/gif - GIF image - image/svg+xml - SVG image - audio/mpeg - MPEG audio files - video/mp4 - MP4 video file - application/font-woff - WOFF font file - application/font-ttf - TTF font file - application/zip - ZIP compressed file - application/octet-stream - binary data Twitter handle: @coolbeevip --------- Co-authored-by: Bagatur <baskaryan@gmail.com>		2024-04-25 11:29:41 -07:00
..
adapters
agent_toolkits	community[patch],core[minor]: Move BaseToolKit to core.tools (#20669 )	2024-04-22 14:04:30 -04:00
callbacks	patch: remove usage of llm, chat model __call__ (#20788 )	2024-04-24 19:39:23 -04:00
chat_loaders	community[patch]: import flattening fix (#20110 )	2024-04-10 13:01:19 -04:00
chat_message_histories	core[patch],community[patch]: Move file chat history back to community (#20834 )	2024-04-24 12:47:25 -04:00
chat_models	patch: remove usage of llm, chat model __call__ (#20788 )	2024-04-24 19:39:23 -04:00
cross_encoders	community[patch]: `cross_encoders` flatten namespaces (#20183 )	2024-04-08 20:50:23 -04:00
docstore	community[patch]: docstrings update (#20301 )	2024-04-11 16:23:27 -04:00
document_compressors	community[mionr]: add Jina Reranker in retrievers module (#19406 )	2024-04-25 10:27:10 -07:00
document_loaders	community[patch]: add HTTP response headers Content-Type to metadata of RecursiveUrlLoader document (#20875 )	2024-04-25 11:29:41 -07:00
document_transformers	community[patch]: add BeautifulSoupTransformer remove_unwanted_classnames method (#20467 )	2024-04-25 17:04:04 +00:00
embeddings	community[patch]: YandexGPT API add ability to disable request logging (#20670 )	2024-04-19 21:40:37 -04:00
example_selectors
graphs	community[patch]: Add driver config param for neo4j graph (#20772 )	2024-04-24 21:14:41 +00:00
indexes	community[patch]: docstrings update (#20301 )	2024-04-11 16:23:27 -04:00
llms	patch: remove usage of llm, chat model __call__ (#20788 )	2024-04-24 19:39:23 -04:00
output_parsers
retrievers	patch: deprecate (a)get_relevant_documents (#20477 )	2024-04-22 11:14:53 -04:00
storage	community[patch]: import flattening fix (#20110 )	2024-04-10 13:01:19 -04:00
tools	community[patch]: deprecating remaining google_community integrations (#20471 )	2024-04-15 09:57:12 -04:00
utilities	community[minor]: Add async methods to CassandraVectorStore (#20602 )	2024-04-20 02:09:58 +00:00
utils	community[patch]: docstrings update (#20301 )	2024-04-11 16:23:27 -04:00
vectorstores	core[minor], langchain[patch], community[patch]: mv StructuredQuery (#20849 )	2024-04-25 09:40:26 -07:00
__init__.py
cache.py	patch: remove usage of llm, chat model __call__ (#20788 )	2024-04-24 19:39:23 -04:00
py.typed