Update multi-modal template README.md (#14991)

<!-- Thank you for contributing to LangChain!

Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.

Replace this entire comment with:
  - **Description:** a description of the change, 
  - **Issue:** the issue # it fixes if applicable,
  - **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!

Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.

See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/

If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.

If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
 -->
pull/14964/head^2
Lance Martin 9 months ago committed by GitHub
parent ca0a75e1fc
commit 448b4d3522
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -1,13 +1,15 @@
# rag-chroma-multi-modal
Multi-modal LLMs enable text-to-image retrieval and question-answering over images.
Multi-modal LLMs enable visual assistants that can perform question-answering about images.
You can ask questions in natural language about a collection of photos, retrieve relevant ones, and have a multi-modal LLM answer questions about the retrieved images.
This template create a visual assistant for slide decks, which often contain visuals such as graphs or figures.
This template performs text-to-image retrieval for question-answering about a slide deck, which often contains visual elements that are not captured in standard RAG.
It uses OpenCLIP embeddings to embed all of the slide images and stores them in Chroma.
Given a question, relevat slides are retrieved and passed to GPT-4V for answer synthesis.
This will use OpenCLIP embeddings and GPT-4V for answer synthesis.
![mm-mmembd](https://github.com/langchain-ai/langchain/assets/122662504/b3bc8406-48ae-4707-9edf-d0b3a511b200)
## Input
@ -112,4 +114,4 @@ We can access the template from code with:
from langserve.client import RemoteRunnable
runnable = RemoteRunnable("http://localhost:8000/rag-chroma-multi-modal")
```
```

Loading…
Cancel
Save