langchain/libs/community
Martin Triska 2df8ac402a
community[minor]: Added propagation of document metadata from O365BaseLoader (#20663)
**Description:**
- Added propagation of document metadata from O365BaseLoader to
FileSystemBlobLoader (O365BaseLoader uses FileSystemBlobLoader under the
hood).
- This is done by passing dictionary `metadata_dict`: key=filename and
value=dictionary containing document's metadata
- Modified `FileSystemBlobLoader` to accept the `metadata_dict`, use
`mimetype` from it (if available) and pass metadata further into blob
loader.

**Issue:**
- `O365BaseLoader` under the hood downloads documents to temp folder and
then uses `FileSystemBlobLoader` on it.
- However metadata about the document in question is lost in this
process. In particular:
- `mime_type`: `FileSystemBlobLoader` guesses `mime_type` from the file
extension, but that does not work 100% of the time.
- `web_url`: this is useful to keep around since in RAG LLM we might
want to provide link to the source document. In order to work well with
document parsers, we pass the `web_url` as `source` (`web_url` is
ignored by parsers, `source` is preserved)

**Dependencies:**
None

**Twitter handle:**
@martintriska1

Please review @baskaryan

---------

Co-authored-by: Bagatur <22008038+baskaryan@users.noreply.github.com>
Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
2024-05-23 11:42:19 -04:00
..
langchain_community community[minor]: Added propagation of document metadata from O365BaseLoader (#20663) 2024-05-23 11:42:19 -04:00
scripts langchain[patch],community[minor]: Move some unit tests from langchain to community, use core for fake models (#21190) 2024-05-02 09:57:52 -04:00
tests community[minor]: Add CloudBlobLoader that supports loading data from cloud buckets (#21957) 2024-05-23 10:59:55 -04:00
Makefile community[minor]: add Kinetica LLM wrapper (#17879) 2024-02-22 16:02:00 -08:00
poetry.lock community[minor]: Add CloudBlobLoader that supports loading data from cloud buckets (#21957) 2024-05-23 10:59:55 -04:00
pyproject.toml community[minor]: Add CloudBlobLoader that supports loading data from cloud buckets (#21957) 2024-05-23 10:59:55 -04:00
README.md Batch update of alt text and title attributes for images in md/mdx files across repo (#15357) 2024-01-12 14:37:48 -08:00

🦜🧑‍🤝‍🧑 LangChain Community

Downloads License: MIT

Quick Install

pip install langchain-community

What is it?

LangChain Community contains third-party integrations that implement the base interfaces defined in LangChain Core, making them ready-to-use in any LangChain application.

For full documentation see the API reference.

Diagram outlining the hierarchical organization of the LangChain framework, displaying the interconnected parts across multiple layers.

📕 Releases & Versioning

langchain-community is currently on version 0.0.x

All changes will be accompanied by a patch version increase.

💁 Contributing

As an open-source project in a rapidly developing field, we are extremely open to contributions, whether it be in the form of a new feature, improved infrastructure, or better documentation.

For detailed information on how to contribute, see the Contributing Guide.