langchain

mirror of https://github.com/hwchase17/langchain synced 2024-11-10 01:10:59 +00:00

History

Martin Triska 2df8ac402a community[minor]: Added propagation of document metadata from O365BaseLoader (#20663 ) Description: - Added propagation of document metadata from O365BaseLoader to FileSystemBlobLoader (O365BaseLoader uses FileSystemBlobLoader under the hood). - This is done by passing dictionary `metadata_dict`: key=filename and value=dictionary containing document's metadata - Modified `FileSystemBlobLoader` to accept the `metadata_dict`, use `mimetype` from it (if available) and pass metadata further into blob loader. Issue: - `O365BaseLoader` under the hood downloads documents to temp folder and then uses `FileSystemBlobLoader` on it. - However metadata about the document in question is lost in this process. In particular: - `mime_type`: `FileSystemBlobLoader` guesses `mime_type` from the file extension, but that does not work 100% of the time. - `web_url`: this is useful to keep around since in RAG LLM we might want to provide link to the source document. In order to work well with document parsers, we pass the `web_url` as `source` (`web_url` is ignored by parsers, `source` is preserved) Dependencies: None Twitter handle: @martintriska1 Please review @baskaryan --------- Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com> Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>		2024-05-23 11:42:19 -04:00
..
adapters	community[patch]: upgrade to recent version of mypy (#21616 )	2024-05-13 14:55:07 -04:00
agent_toolkits	community[patch]: Fix remaining __inits__ in community (#22037 )	2024-05-22 17:42:17 +00:00
agents	langchain, community: move OpenAIAssistantV2Runnable to community (#22044 )	2024-05-22 21:22:50 +00:00
callbacks	infra: rm unused # noqa violations (#22049 )	2024-05-22 15:21:08 -07:00
chains	langchain[minor]: Add PebbloRetrievalQA chain with Identity & Semantic Enforcement support (#20641 )	2024-05-15 13:14:52 +00:00
chat_loaders	infra: rm unused # noqa violations (#22049 )	2024-05-22 15:21:08 -07:00
chat_message_histories	community[minor]: Add async methods to CassandraChatMessageHistory (#21975 )	2024-05-23 10:13:05 -04:00
chat_models	infra: rm unused # noqa violations (#22049 )	2024-05-22 15:21:08 -07:00
cross_encoders	multiple: langchain 0.2 in master (#21191 )	2024-05-08 16:46:52 -04:00
docstore	community[patch]: Fix remaining __inits__ in community (#22037 )	2024-05-22 17:42:17 +00:00
document_compressors	langchain: add RankLLM Reranker (#21171 )	2024-05-22 20:12:55 +00:00
document_loaders	community[minor]: Added propagation of document metadata from O365BaseLoader (#20663 )	2024-05-23 11:42:19 -04:00
document_transformers	infra: rm unused # noqa violations (#22049 )	2024-05-22 15:21:08 -07:00
embeddings	infra: rm unused # noqa violations (#22049 )	2024-05-22 15:21:08 -07:00
example_selectors
graphs	infra: rm unused # noqa violations (#22049 )	2024-05-22 15:21:08 -07:00
indexes	community[patch]: upgrade to recent version of mypy (#21616 )	2024-05-13 14:55:07 -04:00
llms	infra: rm unused # noqa violations (#22049 )	2024-05-22 15:21:08 -07:00
memory
output_parsers	infra: rm unused # noqa violations (#22049 )	2024-05-22 15:21:08 -07:00
query_constructors	multiple: langchain 0.2 in master (#21191 )	2024-05-08 16:46:52 -04:00
retrievers	infra: rm unused # noqa violations (#22049 )	2024-05-22 15:21:08 -07:00
storage	community[minor]: Add Cassandra ByteStore (#22064 )	2024-05-23 10:46:23 -04:00
tools	community[patch]: Adding HEADER to the list of supported locations (#21946 )	2024-05-22 22:47:56 +00:00
utilities	community[minor]: Add Cassandra ByteStore (#22064 )	2024-05-23 10:46:23 -04:00
utils
vectorstores	community[patch]: surrealdb provide functions for MMR (Maximal Marginal Relevance) (#21185 )	2024-05-22 22:53:55 +00:00
__init__.py
cache.py	community: init signature revision for Cassandra LLM cache classes + small maintenance (#17765 )	2024-05-16 17:22:24 +00:00
py.typed