2023-11-07 17:10:24 +00:00
{
"cells": [
{
"attachments": {
"9bbbcfe4-2b85-4e76-996a-ce8d1497d34e.png": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAABnkAAAMxCAYAAAAnrNaWAAAMQGlDQ1BJQ0MgUHJvZmlsZQAASImVVwdYU8kWnluSkJAQIICAlNCbIFIDSAmhBZBeBBshCRBKjIGgYkcXFVy7iIANXRVR7IDYETuLYu+LBRVlXSzYlTcpoOu+8r35vrnz33/O/OfMuTP33gGAfpwnkeSimgDkiQukcaGBzNEpqUzSU0AEdEAFVkCLx8+XsGNiIgEsA+3fy7vrAJG3VxzlWv/s/69FSyDM5wOAxECcLsjn50G8HwC8mi+RFgBAlPMWkwskcgwr0JHCACFeIMeZSlwtx+lKvFthkxDHgbgVADUqjyfNBEDjEuSZhfxMqKHRC7GzWCASA0BnQuyXlzdRAHEaxLbQRgKxXJ+V/oNO5t800wc1ebzMQayci6KoBYnyJbm8qf9nOv53ycuVDfiwhpWaJQ2Lk88Z5u1mzsQIOaZC3CNOj4qGWBviDyKBwh5ilJIlC0tU2qNG/HwOzBnQg9hZwAuKgNgI4hBxblSkik/PEIVwIYYrBJ0iKuAmQKwP8QJhfnC8ymaDdGKcyhfakCHlsFX8WZ5U4Vfu674sJ5Gt0n+dJeSq9DGNoqyEZIgpEFsWipKiINaA2Ck/Jz5CZTOyKIsTNWAjlcXJ47eEOE4oDg1U6mOFGdKQOJV9aV7+wHyxDVkibpQK7y3ISghT5gdr5fMU8cO5YJeEYnbigI4wf3TkwFwEwqBg5dyxZ0JxYrxK54OkIDBOORanSHJjVPa4uTA3VM6bQ+yWXxivGosnFcAFqdTHMyQFMQnKOPGibF54jDIefCmIBBwQBJhABms6mAiygai9p7EH3il7QgAPSEEmEAJHFTMwIlnRI4bXeFAE/oRICPIHxwUqeoWgEPJfB1nl1RFkKHoLFSNywBOI80AEyIX3MsUo8aC3JPAYMqJ/eOfByofx5sIq7//3/AD7nWFDJlLFyAY8MukDlsRgYhAxjBhCtMMNcT/cB4+E1wBYXXAW7jUwj+/2hCeEDsJDwjVCJ+HWBFGx9KcoR4FOqB+iykX6j7nAraGmOx6I+0J1qIzr4YbAEXeDfti4P/TsDlmOKm55Vpg/af9tBj88DZUd2ZmMkoeQA8i2P4/UsNdwH1SR5/rH/ChjTR/MN2ew52f/nB+yL4BtxM+W2AJsH3YGO4Gdww5jjYCJHcOasDbsiBwPrq7HitU14C1OEU8O1BH9w9/Ak5VnMt+5zrnb+Yuyr0A4Rf6OBpyJkqlSUWZWAZMNvwhCJlfMdxrGdHF2cQVA/n1Rvr7exCq+G4he23du7h8A+B7r7+8/9J0LPwbAHk+4/Q9+52xZ8NOhDsDZg3yZtFDJ4fILAb4l6HCnGQATYAFs4XxcgAfwAQEgGISDaJAAUsB4GH0WXOdSMBlMB3NACSgDS8EqUAnWg01gG9gJ9oJGcBicAKfBBXAJXAN34OrpAi9AL3gHPiMIQkJoCAMxQEwRK8QBcUFYiB8SjEQicUgKkoZkImJEhkxH5iJlyHKkEtmI1CJ7kIPICeQc0oHcQh4g3chr5BOKoVRUBzVGrdHhKAtloxFoAjoOzUQnoUXoPHQxWoHWoDvQBvQEegG9hnaiL9A+DGDqmB5mhjliLIyDRWOpWAYmxWZipVg5VoPVY83wOV/BOrEe7CNOxBk4E3eEKzgMT8T5+CR8Jr4Ir8S34Q14K34Ff4D34t8INIIRwYHgTeASRhMyCZMJJYRywhbCAcIpuJe6CO+IRKIe0YboCfdiCjGbOI24iLiWuIt4nNhBfETsI5FIBiQHki8pmsQjFZBKSGtIO0jHSJdJXaQPaupqpmouaiFqqWpitWK1crXtakfVLqs9VftM1iRbkb3J0WQBeSp5CXkzuZl8kdxF/kzRothQfCkJlGzKHEoFpZ5yinKX8kZdXd1c3Us9Vl2kPlu9Qn23+ln1B+ofqdpUeyqHOpYqoy6mbqUep96ivqHRaNa0AFoqrYC2mFZLO0m7T/ugwdBw0uBqCDRmaVRpNGhc1nhJJ9Ot6Gz6eHoRvZy+j36R3qNJ1rTW5GjyNGdqVmke1Lyh2afF0BqhFa2Vp7VIa7vWOa1n2iRta+1gbYH2PO1N2ie1HzEwhgWDw+Az5jI2M04xunSIOjY6XJ1snTKdnTrtOr262rpuukm6U3SrdI/oduphetZ6XL1cvSV6e/Wu630aYjyEPUQ4ZOGQ+iGXh7zXH6ofoC/UL9XfpX9N/5MB0yDYIMdgmUGjwT1D3NDeMNZwsuE6w1OGPUN1hvoM5Q8tHbp36G0j1MjeKM5omtEmozajPmMT41BjifEa45PGPSZ6JgEm2SYrTY6adJsyTP1MRaYrTY+ZPmfqMtnMXGYFs5XZa2ZkFmYmM9to1m722dzGPNG82HyX+T0LigXLIsNipUWLRa+lqeUoy+mWdZa3rchWLKssq9VWZ6zeW9tYJ1vPt260fmajb8O1KbKps7lrS7P1t51kW2N71Y5ox7LLsVtrd8ketXe3z7Kvsr/ogDp4OIgc1jp0DCMM8xomHlYz7IYj1ZHtWOhY5/jASc8p0qnYqdHp5XDL4anDlw0/M/ybs7tzrvNm5zsjtEeEjyge0TzitYu9C9+lyuWqK801xHWWa5PrKzcHN6HbOreb7gz3Ue7z3Vvcv3p4ekg96j26PS090zyrPW+wdFgxrEWss14Er0CvWV6HvT56e3gXeO/1/svH0SfHZ7vPs5E2I4UjN4985Gvuy/Pd6Nvpx/RL89vg1+lv5s/zr/F/GGARIAjYEvCUbcfOZu9gvwx0DpQGHgh8z/HmzOAcD8KCQoNKg9qDtYMTgyuD74eYh2SG1IX0hrqHTgs9HkYIiwhbFnaDa8zlc2u5veGe4TPCWyOoEfERlREPI+0jpZHNo9BR4aNWjLobZRUljmqMBtHc6BXR92JsYibFHIolxsbEVsU+iRsRNz3uTDwjfkL89vh3CYEJSxLuJNomyhJbkuhJY5Nqk94nByUvT+4cPXz0jNEXUgxTRClNqaTUpNQtqX1jgsesGtM11n1sydjr42zGTRl3brzh+NzxRybQJ/Am7EsjpCWnbU/7wovm1fD60rnp1em9fA5/Nf+FIECwUtAt9BUuFz7N8M1YnvEs0zdzRWZ3ln9WeVaPiCOqFL3KDsten/0+Jzpna05/bnLurjy1vLS8g2JtcY64daLJxCkTOyQOkhJJ5yTvSasm9UojpFvykfxx+U0FOvBHvk1mK/tF9qDQr7Cq8MPkpMn7pmhNEU9pm2o/deHUp0UhRb9Nw6fxp7VMN5s+Z/qDGewZG2ciM9NntsyymDVvVtfs0Nnb5lDm5Mz5vdi5eHnx27nJc5vnGc+bPe/RL6G/1JVolEhLbsz3mb9+Ab5AtKB9oevCNQu/lQpKz5c5l5WXfVnEX3T+1xG/VvzavzhjcfsSjyXrlhKXipdeX+a/bNtyreVFyx+tGLWiYSVzZenKt6smrDpX7la+fjVltWx1Z0VkRdMayzVL13ypzKq8VhVYtavaqHph9fu1grWX1wWsq19vvL5s/acNog03N4ZubKixrinfRNxUuOnJ5qTNZ35j/Va7xXBL2ZavW8VbO7fFbWut9ayt3W60fUkdWier694xdselnUE7m+od6zfu0ttVthvslu1+vidtz/W9EXtb9rH21e+32l99gHGgtAFpmNrQ25jV2NmU0tRxMPxgS7NP84FDToe2HjY7XHVE98iSo5Sj8472Hys61ndccrznROaJRy0TWu6cHH3yamtsa/upiFNnT4ecPnmGfebYWd+zh895nzt4nnW+8YLHhYY297YDv7v/fqDdo73houfFpktel5o7RnYcvex/+cSVoCunr3KvXrgWda3jeuL1mzfG3ui8Kbj57FburVe3C29/vjP7LuFu6T3Ne+X3je7X/GH3x65Oj84jD4IetD2Mf3jnEf/Ri8f5j790zXtCe1L+1PRp7TOXZ4e7Q7ovPR/zvOuF5MXnnpI/tf6sfmn7cv9fAX+19Y7u7XolfdX/etEbgzdb37q9bemL6bv/Lu/d5/elHww+bPvI+njmU/Knp58nfyF9qfhq97X5
}
},
"cell_type": "markdown",
"id": "812a4dbc-fe04-4b84-bdf9-390045e30806",
"metadata": {},
"source": [
"## Multi-modal RAG\n",
"\n",
"Many documents contain a mixture of content types, including text and images. \n",
"\n",
"Yet, information captured in images is lost in most RAG applications.\n",
"\n",
"With the emergence of multimodal LLMs, like [GPT-4V](https://openai.com/research/gpt-4v-system-card), it is worth considering how to utilize images in RAG:\n",
"\n",
"`Option 1:` \n",
"\n",
"* Use multimodal embeddings (such as [CLIP](https://openai.com/research/clip)) to embed images and text\n",
"* Retrieve both using similarity search\n",
"* Pass raw images and text chunks to a multimodal LLM for answer synthesis \n",
"\n",
"`Option 2:` \n",
"\n",
"* Use a multimodal LLM (such as [GPT-4V](https://openai.com/research/gpt-4v-system-card), [LLaVA](https://llava.hliu.cc/), or [FUYU-8b](https://www.adept.ai/blog/fuyu-8b)) to produce text summaries from images\n",
"* Embed and retrieve text \n",
"* Pass text chunks to an LLM for answer synthesis \n",
"\n",
2023-11-16 18:34:13 +00:00
"`Option 3`\n",
2023-11-07 17:10:24 +00:00
"\n",
"* Use a multimodal LLM (such as [GPT-4V](https://openai.com/research/gpt-4v-system-card), [LLaVA](https://llava.hliu.cc/), or [FUYU-8b](https://www.adept.ai/blog/fuyu-8b)) to produce text summaries from images\n",
"* Embed and retrieve image summaries with a reference to the raw image \n",
"* Pass raw images and text chunks to a multimodal LLM for answer synthesis \n",
"\n",
2023-11-16 18:34:13 +00:00
"---\n",
"\n",
2023-11-07 17:10:24 +00:00
"This cookbook highlights `Option 3`. \n",
"\n",
"* We will use [Unstructured](https://unstructured.io/) to parse images, text, and tables from documents (PDFs).\n",
"* We will use the [multi-vector retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/multi_vector) with [Chroma](https://www.trychroma.com/) to store raw text and images along with their summaries for retrieval.\n",
2023-11-16 18:34:13 +00:00
"* We will use GPT-4V for both image summarization (for retrieval) as well as final answer synthesis from join review of images and texts (or tables).\n",
"\n",
"---\n",
2023-11-07 17:10:24 +00:00
"\n",
2023-12-22 19:49:35 +00:00
"A separate cookbook highlights `Option 1` [here](https://github.com/langchain-ai/langchain/blob/master/cookbook/multi_modal_RAG_chroma.ipynb).\n",
2023-11-07 17:10:24 +00:00
"\n",
"And option `Option 2` is appropriate for cases when a multi-modal LLM cannot be used for answer synthesis (e.g., cost, etc).\n",
"\n",
"![ss_mm_rag.png](attachment:9bbbcfe4-2b85-4e76-996a-ce8d1497d34e.png)\n",
"\n",
"## Packages\n",
"\n",
"In addition to the below pip packages, you will also need `poppler` ([installation instructions](https://pdf2image.readthedocs.io/en/latest/installation.html)) and `tesseract` ([installation instructions](https://tesseract-ocr.github.io/tessdoc/Installation.html)) in your system."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "98f9ee74-395f-4aa4-9695-c00ade01195a",
"metadata": {},
"outputs": [],
"source": [
2023-11-16 18:34:13 +00:00
"! pip install -U langchain openai chromadb langchain-experimental # (newest versions required for multi-modal)"
2023-11-07 17:10:24 +00:00
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "140580ef-5db0-43cc-a524-9c39e04d4df0",
"metadata": {},
"outputs": [],
"source": [
"! pip install \"unstructured[all-docs]\" pillow pydantic lxml pillow matplotlib chromadb tiktoken"
]
},
{
"cell_type": "markdown",
"id": "74b56bde-1ba0-4525-a11d-cab02c5659e4",
"metadata": {},
"source": [
"## Data Loading\n",
"\n",
"### Partition PDF tables, text, and images\n",
" \n",
2023-11-16 18:34:13 +00:00
"Let's look at a [popular blog](https://cloudedjudgement.substack.com/p/clouded-judgement-111023) by Jamin Ball.\n",
2023-11-07 17:10:24 +00:00
"\n",
2023-11-16 18:34:13 +00:00
"This is a great use-case because much of the information is captured in images (of tables or charts).\n",
"\n",
"We use `Unstructured` to partition it (see [blog post](https://blog.langchain.dev/semi-structured-multi-modal-rag/)).\n",
"\n",
"---\n",
"\n",
"To skip `Unstructured` extraction:\n",
"\n",
"[Here](https://drive.google.com/file/d/1QlhGFIFwEkNEjQGOvV_hQe4bnOLDJwCR/view?usp=sharing) is a zip file with a sub-set of the extracted images and pdf.\n",
"\n",
"If you want to use the provided folder, then simply opt for a [pdf loader](https://python.langchain.com/docs/modules/data_connection/document_loaders/pdf) for the document:\n",
"\n",
"```\n",
2024-01-02 21:47:11 +00:00
"from langchain_community.document_loaders import PyPDFLoader\n",
2023-11-16 18:34:13 +00:00
"loader = PyPDFLoader(path + fname)\n",
"docs = loader.load()\n",
"tables = [] # Ignore w/ basic pdf loader\n",
"texts = [d.page_content for d in docs]\n",
"```"
2023-11-07 17:10:24 +00:00
]
},
{
"cell_type": "code",
"execution_count": null,
2023-11-16 18:34:13 +00:00
"id": "c59df23c-86f7-4e5d-8b8c-de92a92f6637",
"metadata": {},
2023-11-07 17:10:24 +00:00
"outputs": [],
"source": [
2023-11-16 18:34:13 +00:00
"from langchain.text_splitter import CharacterTextSplitter\n",
2023-11-17 19:58:54 +00:00
"from unstructured.partition.pdf import partition_pdf\n",
2023-11-16 18:34:13 +00:00
"\n",
"\n",
"# Extract elements from PDF\n",
"def extract_pdf_elements(path, fname):\n",
" \"\"\"\n",
" Extract images, tables, and chunk text from a PDF file.\n",
" path: File path, which is used to dump images (.jpg)\n",
" fname: File name\n",
" \"\"\"\n",
" return partition_pdf(\n",
" filename=path + fname,\n",
" extract_images_in_pdf=False,\n",
" infer_table_structure=True,\n",
" chunking_strategy=\"by_title\",\n",
" max_characters=4000,\n",
" new_after_n_chars=3800,\n",
" combine_text_under_n_chars=2000,\n",
" image_output_dir_path=path,\n",
" )\n",
2023-11-09 18:22:49 +00:00
"\n",
2023-11-16 18:34:13 +00:00
"\n",
"# Categorize elements by type\n",
"def categorize_elements(raw_pdf_elements):\n",
" \"\"\"\n",
" Categorize extracted elements from a PDF into tables and texts.\n",
" raw_pdf_elements: List of unstructured.documents.elements\n",
" \"\"\"\n",
" tables = []\n",
" texts = []\n",
" for element in raw_pdf_elements:\n",
" if \"unstructured.documents.elements.Table\" in str(type(element)):\n",
" tables.append(str(element))\n",
" elif \"unstructured.documents.elements.CompositeElement\" in str(type(element)):\n",
" texts.append(str(element))\n",
" return texts, tables\n",
"\n",
"\n",
"# File path\n",
"fpath = \"/Users/rlm/Desktop/cj/\"\n",
"fname = \"cj.pdf\"\n",
"\n",
"# Get elements\n",
"raw_pdf_elements = extract_pdf_elements(fpath, fname)\n",
"\n",
"# Get text, tables\n",
"texts, tables = categorize_elements(raw_pdf_elements)\n",
"\n",
"# Optional: Enforce a specific token size for texts\n",
"text_splitter = CharacterTextSplitter.from_tiktoken_encoder(\n",
" chunk_size=4000, chunk_overlap=0\n",
")\n",
"joined_texts = \" \".join(texts)\n",
"texts_4k_token = text_splitter.split_text(joined_texts)"
2023-11-07 17:10:24 +00:00
]
},
{
"cell_type": "markdown",
"id": "0aa7f52f-bf5c-4ba4-af72-b2ccba59a4cf",
"metadata": {},
"source": [
"## Multi-vector retriever\n",
"\n",
2023-11-16 18:34:13 +00:00
"Use [multi-vector-retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/multi_vector#summary) to index image (and / or text, table) summaries, but retrieve raw images (along with raw texts or tables).\n",
2023-11-07 17:10:24 +00:00
"\n",
"### Text and Table summaries\n",
"\n",
2023-11-16 18:34:13 +00:00
"We will use GPT-4 to produce table and, optionall, text summaries.\n",
"\n",
"Text summaries are advised if using large chunk sizes (e.g., as set above, we use 4k token chunks).\n",
"\n",
2023-11-07 17:10:24 +00:00
"Summaries are used to retrieve raw tables and / or raw chunks of text."
]
},
{
"cell_type": "code",
2023-11-16 18:34:13 +00:00
"execution_count": 33,
2023-11-07 17:10:24 +00:00
"id": "523e6ed2-2132-4748-bdb7-db765f20648d",
"metadata": {},
"outputs": [],
"source": [
docs[patch], templates[patch]: Import from core (#14575)
Update imports to use core for the low-hanging fruit changes. Ran
following
```bash
git grep -l 'langchain.schema.runnable' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.runnable/langchain_core.runnables/g'
git grep -l 'langchain.schema.output_parser' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.output_parser/langchain_core.output_parsers/g'
git grep -l 'langchain.schema.messages' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.messages/langchain_core.messages/g'
git grep -l 'langchain.schema.chat_histry' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.chat_history/langchain_core.chat_history/g'
git grep -l 'langchain.schema.prompt_template' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.prompt_template/langchain_core.prompts/g'
git grep -l 'from langchain.pydantic_v1' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.pydantic_v1/from langchain_core.pydantic_v1/g'
git grep -l 'from langchain.tools.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.tools\.base/from langchain_core.tools/g'
git grep -l 'from langchain.chat_models.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.chat_models.base/from langchain_core.language_models.chat_models/g'
git grep -l 'from langchain.llms.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.llms\.base\ /from langchain_core.language_models.llms\ /g'
git grep -l 'from langchain.embeddings.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.embeddings\.base/from langchain_core.embeddings/g'
git grep -l 'from langchain.vectorstores.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.vectorstores\.base/from langchain_core.vectorstores/g'
git grep -l 'from langchain.agents.tools' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.agents\.tools/from langchain_core.tools/g'
git grep -l 'from langchain.schema.output' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.output\ /from langchain_core.outputs\ /g'
git grep -l 'from langchain.schema.embeddings' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.embeddings/from langchain_core.embeddings/g'
git grep -l 'from langchain.schema.document' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.document/from langchain_core.documents/g'
git grep -l 'from langchain.schema.agent' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.agent/from langchain_core.agents/g'
git grep -l 'from langchain.schema.prompt ' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.prompt\ /from langchain_core.prompt_values /g'
git grep -l 'from langchain.schema.language_model' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.language_model/from langchain_core.language_models/g'
```
2023-12-12 00:49:10 +00:00
"from langchain_core.output_parsers import StrOutputParser\n",
2024-01-06 23:54:48 +00:00
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_openai import ChatOpenAI\n",
2023-11-07 17:10:24 +00:00
"\n",
2023-11-16 18:34:13 +00:00
"\n",
"# Generate summaries of text elements\n",
"def generate_text_summaries(texts, tables, summarize_texts=False):\n",
" \"\"\"\n",
" Summarize text elements\n",
" texts: List of str\n",
" tables: List of str\n",
" summarize_texts: Bool to summarize texts\n",
" \"\"\"\n",
"\n",
" # Prompt\n",
" prompt_text = \"\"\"You are an assistant tasked with summarizing tables and text for retrieval. \\\n",
" These summaries will be embedded and used to retrieve the raw text or table elements. \\\n",
" Give a concise summary of the table or text that is well optimized for retrieval. Table or text: {element} \"\"\"\n",
" prompt = ChatPromptTemplate.from_template(prompt_text)\n",
"\n",
" # Text summary chain\n",
" model = ChatOpenAI(temperature=0, model=\"gpt-4\")\n",
" summarize_chain = {\"element\": lambda x: x} | prompt | model | StrOutputParser()\n",
"\n",
" # Initialize empty summaries\n",
" text_summaries = []\n",
" table_summaries = []\n",
"\n",
" # Apply to text if texts are provided and summarization is requested\n",
" if texts and summarize_texts:\n",
" text_summaries = summarize_chain.batch(texts, {\"max_concurrency\": 5})\n",
" elif texts:\n",
" text_summaries = texts\n",
"\n",
" # Apply to tables if tables are provided\n",
" if tables:\n",
" table_summaries = summarize_chain.batch(tables, {\"max_concurrency\": 5})\n",
"\n",
" return text_summaries, table_summaries\n",
"\n",
"\n",
"# Get text, table summaries\n",
"text_summaries, table_summaries = generate_text_summaries(\n",
" texts_4k_token, tables, summarize_texts=True\n",
")"
2023-11-07 17:10:24 +00:00
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "b1feadda-8171-4aed-9a60-320a88dc9ee1",
"metadata": {},
"source": [
"### Image summaries \n",
"\n",
2023-11-16 18:34:13 +00:00
"We will use [GPT-4V](https://openai.com/research/gpt-4v-system-card) to produce the image summaries.\n",
"\n",
"The API docs [here](https://platform.openai.com/docs/guides/vision):\n",
2023-11-09 17:54:00 +00:00
"\n",
2023-11-16 18:34:13 +00:00
"* We pass base64 encoded images"
2023-11-07 17:10:24 +00:00
]
},
{
"cell_type": "code",
2023-11-16 18:34:13 +00:00
"execution_count": 3,
2023-11-07 17:10:24 +00:00
"id": "9e6b1d97-4245-45ac-95ba-9bc1cfd10182",
"metadata": {},
"outputs": [],
"source": [
2023-11-16 18:34:13 +00:00
"import base64\n",
2023-11-17 19:58:54 +00:00
"import os\n",
"\n",
docs[patch], templates[patch]: Import from core (#14575)
Update imports to use core for the low-hanging fruit changes. Ran
following
```bash
git grep -l 'langchain.schema.runnable' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.runnable/langchain_core.runnables/g'
git grep -l 'langchain.schema.output_parser' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.output_parser/langchain_core.output_parsers/g'
git grep -l 'langchain.schema.messages' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.messages/langchain_core.messages/g'
git grep -l 'langchain.schema.chat_histry' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.chat_history/langchain_core.chat_history/g'
git grep -l 'langchain.schema.prompt_template' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.prompt_template/langchain_core.prompts/g'
git grep -l 'from langchain.pydantic_v1' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.pydantic_v1/from langchain_core.pydantic_v1/g'
git grep -l 'from langchain.tools.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.tools\.base/from langchain_core.tools/g'
git grep -l 'from langchain.chat_models.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.chat_models.base/from langchain_core.language_models.chat_models/g'
git grep -l 'from langchain.llms.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.llms\.base\ /from langchain_core.language_models.llms\ /g'
git grep -l 'from langchain.embeddings.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.embeddings\.base/from langchain_core.embeddings/g'
git grep -l 'from langchain.vectorstores.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.vectorstores\.base/from langchain_core.vectorstores/g'
git grep -l 'from langchain.agents.tools' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.agents\.tools/from langchain_core.tools/g'
git grep -l 'from langchain.schema.output' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.output\ /from langchain_core.outputs\ /g'
git grep -l 'from langchain.schema.embeddings' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.embeddings/from langchain_core.embeddings/g'
git grep -l 'from langchain.schema.document' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.document/from langchain_core.documents/g'
git grep -l 'from langchain.schema.agent' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.agent/from langchain_core.agents/g'
git grep -l 'from langchain.schema.prompt ' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.prompt\ /from langchain_core.prompt_values /g'
git grep -l 'from langchain.schema.language_model' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.language_model/from langchain_core.language_models/g'
```
2023-12-12 00:49:10 +00:00
"from langchain_core.messages import HumanMessage\n",
2023-11-07 17:10:24 +00:00
"\n",
2023-11-09 18:22:49 +00:00
"\n",
2023-11-07 17:10:24 +00:00
"def encode_image(image_path):\n",
2023-11-09 18:22:49 +00:00
" \"\"\"Getting the base64 string\"\"\"\n",
2023-11-07 17:10:24 +00:00
" with open(image_path, \"rb\") as image_file:\n",
2023-11-09 18:22:49 +00:00
" return base64.b64encode(image_file.read()).decode(\"utf-8\")\n",
"\n",
"\n",
"def image_summarize(img_base64, prompt):\n",
2023-11-16 18:34:13 +00:00
" \"\"\"Make image summary\"\"\"\n",
2023-11-09 18:22:49 +00:00
" chat = ChatOpenAI(model=\"gpt-4-vision-preview\", max_tokens=1024)\n",
2023-11-07 17:10:24 +00:00
"\n",
" msg = chat.invoke(\n",
" [\n",
" HumanMessage(\n",
" content=[\n",
2023-11-09 18:22:49 +00:00
" {\"type\": \"text\", \"text\": prompt},\n",
2023-11-07 17:10:24 +00:00
" {\n",
" \"type\": \"image_url\",\n",
2023-11-09 18:22:49 +00:00
" \"image_url\": {\"url\": f\"data:image/jpeg;base64,{img_base64}\"},\n",
2023-11-07 17:10:24 +00:00
" },\n",
" ]\n",
" )\n",
" ]\n",
" )\n",
" return msg.content\n",
"\n",
2023-11-09 18:22:49 +00:00
"\n",
2023-11-16 18:34:13 +00:00
"def generate_img_summaries(path):\n",
" \"\"\"\n",
" Generate summaries and base64 encoded strings for images\n",
" path: Path to list of .jpg files extracted by Unstructured\n",
" \"\"\"\n",
2023-11-07 17:10:24 +00:00
"\n",
2023-11-16 18:34:13 +00:00
" # Store base64 encoded images\n",
" img_base64_list = []\n",
2023-11-07 17:10:24 +00:00
"\n",
2023-11-16 18:34:13 +00:00
" # Store image summaries\n",
" image_summaries = []\n",
2023-11-07 17:10:24 +00:00
"\n",
2023-11-16 18:34:13 +00:00
" # Prompt\n",
" prompt = \"\"\"You are an assistant tasked with summarizing images for retrieval. \\\n",
" These summaries will be embedded and used to retrieve the raw image. \\\n",
" Give a concise summary of the image that is well optimized for retrieval.\"\"\"\n",
2023-11-07 17:10:24 +00:00
"\n",
2023-11-16 18:34:13 +00:00
" # Apply to images\n",
" for img_file in sorted(os.listdir(path)):\n",
" if img_file.endswith(\".jpg\"):\n",
" img_path = os.path.join(path, img_file)\n",
" base64_image = encode_image(img_path)\n",
" img_base64_list.append(base64_image)\n",
" image_summaries.append(image_summarize(base64_image, prompt))\n",
2023-11-07 17:10:24 +00:00
"\n",
2023-11-16 18:34:13 +00:00
" return img_base64_list, image_summaries\n",
2023-11-07 17:10:24 +00:00
"\n",
2023-11-09 18:22:49 +00:00
"\n",
2023-11-16 18:34:13 +00:00
"# Image summaries\n",
"img_base64_list, image_summaries = generate_img_summaries(fpath)"
2023-11-07 17:10:24 +00:00
]
},
{
"cell_type": "markdown",
"id": "67b030d4-2ac5-41b6-9245-fc3ba5771d87",
"metadata": {},
"source": [
"### Add to vectorstore\n",
"\n",
2023-11-16 18:34:13 +00:00
"Add raw docs and doc summaries to [Multi Vector Retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/multi_vector#summary): \n",
"\n",
"* Store the raw texts, tables, and images in the `docstore`.\n",
Update Multi_modal_RAG.ipynb (#15378)
Updated comment for better understanding
<!-- Thank you for contributing to LangChain!
Please title your PR "<package>: <description>", where <package> is
whichever of langchain, community, core, experimental, etc. is being
modified.
Replace this entire comment with:
- **Description:** a description of the change,
- **Issue:** the issue # it fixes if applicable,
- **Dependencies:** any dependencies required for this change,
- **Twitter handle:** we announce bigger features on Twitter. If your PR
gets announced, and you'd like a mention, we'll gladly shout you out!
Please make sure your PR is passing linting and testing before
submitting. Run `make format`, `make lint` and `make test` from the root
of the package you've modified to check this locally.
See contribution guidelines for more information on how to write/run
tests, lint, etc: https://python.langchain.com/docs/contributing/
If you're adding a new integration, please include:
1. a test for the integration, preferably unit tests that do not rely on
network access,
2. an example notebook showing its use. It lives in
`docs/docs/integrations` directory.
If no one reviews your PR within a few days, please @-mention one of
@baskaryan, @eyurtsev, @hwchase17.
-->
2024-01-01 21:17:23 +00:00
"* Store the texts, table summaries, and image summaries in the `vectorstore` for efficient semantic retrieval."
2023-11-07 17:10:24 +00:00
]
},
{
"cell_type": "code",
2023-11-16 18:34:13 +00:00
"execution_count": 34,
"id": "24a0a289-b970-49fe-b04f-5d857a4c159b",
2023-11-07 17:10:24 +00:00
"metadata": {},
"outputs": [],
"source": [
"import uuid\n",
2023-11-17 19:58:54 +00:00
"\n",
2023-11-07 17:10:24 +00:00
"from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
2023-11-17 19:58:54 +00:00
"from langchain.storage import InMemoryStore\n",
2024-01-02 21:47:11 +00:00
"from langchain_community.vectorstores import Chroma\n",
docs[patch], templates[patch]: Import from core (#14575)
Update imports to use core for the low-hanging fruit changes. Ran
following
```bash
git grep -l 'langchain.schema.runnable' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.runnable/langchain_core.runnables/g'
git grep -l 'langchain.schema.output_parser' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.output_parser/langchain_core.output_parsers/g'
git grep -l 'langchain.schema.messages' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.messages/langchain_core.messages/g'
git grep -l 'langchain.schema.chat_histry' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.chat_history/langchain_core.chat_history/g'
git grep -l 'langchain.schema.prompt_template' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.prompt_template/langchain_core.prompts/g'
git grep -l 'from langchain.pydantic_v1' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.pydantic_v1/from langchain_core.pydantic_v1/g'
git grep -l 'from langchain.tools.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.tools\.base/from langchain_core.tools/g'
git grep -l 'from langchain.chat_models.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.chat_models.base/from langchain_core.language_models.chat_models/g'
git grep -l 'from langchain.llms.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.llms\.base\ /from langchain_core.language_models.llms\ /g'
git grep -l 'from langchain.embeddings.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.embeddings\.base/from langchain_core.embeddings/g'
git grep -l 'from langchain.vectorstores.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.vectorstores\.base/from langchain_core.vectorstores/g'
git grep -l 'from langchain.agents.tools' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.agents\.tools/from langchain_core.tools/g'
git grep -l 'from langchain.schema.output' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.output\ /from langchain_core.outputs\ /g'
git grep -l 'from langchain.schema.embeddings' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.embeddings/from langchain_core.embeddings/g'
git grep -l 'from langchain.schema.document' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.document/from langchain_core.documents/g'
git grep -l 'from langchain.schema.agent' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.agent/from langchain_core.agents/g'
git grep -l 'from langchain.schema.prompt ' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.prompt\ /from langchain_core.prompt_values /g'
git grep -l 'from langchain.schema.language_model' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.language_model/from langchain_core.language_models/g'
```
2023-12-12 00:49:10 +00:00
"from langchain_core.documents import Document\n",
2024-01-06 23:54:48 +00:00
"from langchain_openai import OpenAIEmbeddings\n",
2023-11-07 17:10:24 +00:00
"\n",
2023-11-16 18:34:13 +00:00
"\n",
"def create_multi_vector_retriever(\n",
" vectorstore, text_summaries, texts, table_summaries, tables, image_summaries, images\n",
"):\n",
" \"\"\"\n",
" Create retriever that indexes summaries, but returns raw images or texts\n",
" \"\"\"\n",
"\n",
" # Initialize the storage layer\n",
" store = InMemoryStore()\n",
" id_key = \"doc_id\"\n",
"\n",
" # Create the multi-vector retriever\n",
" retriever = MultiVectorRetriever(\n",
" vectorstore=vectorstore,\n",
" docstore=store,\n",
" id_key=id_key,\n",
" )\n",
"\n",
" # Helper function to add documents to the vectorstore and docstore\n",
" def add_documents(retriever, doc_summaries, doc_contents):\n",
" doc_ids = [str(uuid.uuid4()) for _ in doc_contents]\n",
" summary_docs = [\n",
" Document(page_content=s, metadata={id_key: doc_ids[i]})\n",
" for i, s in enumerate(doc_summaries)\n",
" ]\n",
" retriever.vectorstore.add_documents(summary_docs)\n",
" retriever.docstore.mset(list(zip(doc_ids, doc_contents)))\n",
"\n",
" # Add texts, tables, and images\n",
" # Check that text_summaries is not empty before adding\n",
" if text_summaries:\n",
" add_documents(retriever, text_summaries, texts)\n",
" # Check that table_summaries is not empty before adding\n",
" if table_summaries:\n",
" add_documents(retriever, table_summaries, tables)\n",
" # Check that image_summaries is not empty before adding\n",
" if image_summaries:\n",
" add_documents(retriever, image_summaries, images)\n",
"\n",
" return retriever\n",
"\n",
"\n",
"# The vectorstore to use to index the summaries\n",
2023-11-09 18:22:49 +00:00
"vectorstore = Chroma(\n",
2023-11-16 18:34:13 +00:00
" collection_name=\"mm_rag_cj_blog\", embedding_function=OpenAIEmbeddings()\n",
2023-11-09 18:22:49 +00:00
")\n",
2023-11-07 17:10:24 +00:00
"\n",
2023-11-16 18:34:13 +00:00
"# Create retriever\n",
"retriever_multi_vector_img = create_multi_vector_retriever(\n",
" vectorstore,\n",
" text_summaries,\n",
" texts,\n",
" table_summaries,\n",
" tables,\n",
" image_summaries,\n",
" img_base64_list,\n",
2023-11-07 17:10:24 +00:00
")"
]
},
{
"cell_type": "markdown",
2023-11-16 18:34:13 +00:00
"id": "69060724-e390-4dda-8250-5f86025c874a",
2023-11-07 17:10:24 +00:00
"metadata": {},
"source": [
2023-11-16 18:34:13 +00:00
"## RAG\n",
"\n",
"### Build retriever\n",
"\n",
"We need to bin the retrieved doc(s) into the correct parts of the GPT-4V prompt template."
2023-11-07 17:10:24 +00:00
]
},
{
"cell_type": "code",
2023-11-16 18:34:13 +00:00
"execution_count": 47,
"id": "771a47fa-1267-4db8-a6ae-5fde48bbc069",
2023-11-07 17:10:24 +00:00
"metadata": {},
"outputs": [],
"source": [
2023-11-16 18:34:13 +00:00
"import io\n",
"import re\n",
2023-11-17 19:58:54 +00:00
"\n",
2023-11-16 18:34:13 +00:00
"from IPython.display import HTML, display\n",
docs[patch], templates[patch]: Import from core (#14575)
Update imports to use core for the low-hanging fruit changes. Ran
following
```bash
git grep -l 'langchain.schema.runnable' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.runnable/langchain_core.runnables/g'
git grep -l 'langchain.schema.output_parser' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.output_parser/langchain_core.output_parsers/g'
git grep -l 'langchain.schema.messages' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.messages/langchain_core.messages/g'
git grep -l 'langchain.schema.chat_histry' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.chat_history/langchain_core.chat_history/g'
git grep -l 'langchain.schema.prompt_template' {docs,templates,cookbook} | xargs sed -i '' 's/langchain\.schema\.prompt_template/langchain_core.prompts/g'
git grep -l 'from langchain.pydantic_v1' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.pydantic_v1/from langchain_core.pydantic_v1/g'
git grep -l 'from langchain.tools.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.tools\.base/from langchain_core.tools/g'
git grep -l 'from langchain.chat_models.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.chat_models.base/from langchain_core.language_models.chat_models/g'
git grep -l 'from langchain.llms.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.llms\.base\ /from langchain_core.language_models.llms\ /g'
git grep -l 'from langchain.embeddings.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.embeddings\.base/from langchain_core.embeddings/g'
git grep -l 'from langchain.vectorstores.base' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.vectorstores\.base/from langchain_core.vectorstores/g'
git grep -l 'from langchain.agents.tools' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.agents\.tools/from langchain_core.tools/g'
git grep -l 'from langchain.schema.output' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.output\ /from langchain_core.outputs\ /g'
git grep -l 'from langchain.schema.embeddings' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.embeddings/from langchain_core.embeddings/g'
git grep -l 'from langchain.schema.document' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.document/from langchain_core.documents/g'
git grep -l 'from langchain.schema.agent' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.agent/from langchain_core.agents/g'
git grep -l 'from langchain.schema.prompt ' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.prompt\ /from langchain_core.prompt_values /g'
git grep -l 'from langchain.schema.language_model' {docs,templates,cookbook} | xargs sed -i '' 's/from langchain\.schema\.language_model/from langchain_core.language_models/g'
```
2023-12-12 00:49:10 +00:00
"from langchain_core.runnables import RunnableLambda, RunnablePassthrough\n",
2023-11-17 19:58:54 +00:00
"from PIL import Image\n",
2023-11-16 18:34:13 +00:00
"\n",
"\n",
"def plt_img_base64(img_base64):\n",
" \"\"\"Disply base64 encoded string as image\"\"\"\n",
" # Create an HTML img tag with the base64 string as the source\n",
" image_html = f'<img src=\"data:image/jpeg;base64,{img_base64}\" />'\n",
" # Display the image by rendering the HTML\n",
" display(HTML(image_html))\n",
"\n",
"\n",
"def looks_like_base64(sb):\n",
" \"\"\"Check if the string looks like base64\"\"\"\n",
" return re.match(\"^[A-Za-z0-9+/]+[=]{0,2}$\", sb) is not None\n",
"\n",
"\n",
"def is_image_data(b64data):\n",
" \"\"\"\n",
" Check if the base64 data is an image by looking at the start of the data\n",
" \"\"\"\n",
" image_signatures = {\n",
" b\"\\xFF\\xD8\\xFF\": \"jpg\",\n",
" b\"\\x89\\x50\\x4E\\x47\\x0D\\x0A\\x1A\\x0A\": \"png\",\n",
" b\"\\x47\\x49\\x46\\x38\": \"gif\",\n",
" b\"\\x52\\x49\\x46\\x46\": \"webp\",\n",
" }\n",
" try:\n",
" header = base64.b64decode(b64data)[:8] # Decode and get the first 8 bytes\n",
" for sig, format in image_signatures.items():\n",
" if header.startswith(sig):\n",
" return True\n",
" return False\n",
" except Exception:\n",
" return False\n",
"\n",
"\n",
"def resize_base64_image(base64_string, size=(128, 128)):\n",
" \"\"\"\n",
" Resize an image encoded as a Base64 string\n",
" \"\"\"\n",
" # Decode the Base64 string\n",
" img_data = base64.b64decode(base64_string)\n",
" img = Image.open(io.BytesIO(img_data))\n",
"\n",
" # Resize the image\n",
" resized_img = img.resize(size, Image.LANCZOS)\n",
"\n",
" # Save the resized image to a bytes buffer\n",
" buffered = io.BytesIO()\n",
" resized_img.save(buffered, format=img.format)\n",
"\n",
" # Encode the resized image to Base64\n",
" return base64.b64encode(buffered.getvalue()).decode(\"utf-8\")\n",
"\n",
"\n",
"def split_image_text_types(docs):\n",
" \"\"\"\n",
" Split base64-encoded images and texts\n",
" \"\"\"\n",
" b64_images = []\n",
" texts = []\n",
" for doc in docs:\n",
" # Check if the document is of type Document and extract page_content if so\n",
" if isinstance(doc, Document):\n",
" doc = doc.page_content\n",
" if looks_like_base64(doc) and is_image_data(doc):\n",
" doc = resize_base64_image(doc, size=(1300, 600))\n",
" b64_images.append(doc)\n",
" else:\n",
" texts.append(doc)\n",
" return {\"images\": b64_images, \"texts\": texts}\n",
"\n",
"\n",
"def img_prompt_func(data_dict):\n",
" \"\"\"\n",
" Join the context into a single string\n",
" \"\"\"\n",
" formatted_texts = \"\\n\".join(data_dict[\"context\"][\"texts\"])\n",
" messages = []\n",
"\n",
" # Adding image(s) to the messages if present\n",
" if data_dict[\"context\"][\"images\"]:\n",
" for image in data_dict[\"context\"][\"images\"]:\n",
" image_message = {\n",
" \"type\": \"image_url\",\n",
" \"image_url\": {\"url\": f\"data:image/jpeg;base64,{image}\"},\n",
" }\n",
" messages.append(image_message)\n",
"\n",
" # Adding the text for analysis\n",
" text_message = {\n",
" \"type\": \"text\",\n",
" \"text\": (\n",
" \"You are financial analyst tasking with providing investment advice.\\n\"\n",
" \"You will be given a mixed of text, tables, and image(s) usually of charts or graphs.\\n\"\n",
" \"Use this information to provide investment advice related to the user question. \\n\"\n",
" f\"User-provided question: {data_dict['question']}\\n\\n\"\n",
" \"Text and / or tables:\\n\"\n",
" f\"{formatted_texts}\"\n",
" ),\n",
" }\n",
" messages.append(text_message)\n",
" return [HumanMessage(content=messages)]\n",
"\n",
"\n",
"def multi_modal_rag_chain(retriever):\n",
" \"\"\"\n",
" Multi-modal RAG chain\n",
" \"\"\"\n",
"\n",
" # Multi-modal LLM\n",
" model = ChatOpenAI(temperature=0, model=\"gpt-4-vision-preview\", max_tokens=1024)\n",
"\n",
" # RAG pipeline\n",
" chain = (\n",
" {\n",
" \"context\": retriever | RunnableLambda(split_image_text_types),\n",
" \"question\": RunnablePassthrough(),\n",
" }\n",
" | RunnableLambda(img_prompt_func)\n",
" | model\n",
" | StrOutputParser()\n",
" )\n",
"\n",
" return chain\n",
"\n",
"\n",
"# Create RAG chain\n",
"chain_multimodal_rag = multi_modal_rag_chain(retriever_multi_vector_img)"
2023-11-07 17:10:24 +00:00
]
},
{
"cell_type": "markdown",
2023-11-16 18:34:13 +00:00
"id": "087ab1e2-fc9a-42af-9d93-a35dd172b130",
2023-11-07 17:10:24 +00:00
"metadata": {},
"source": [
2023-11-16 18:34:13 +00:00
"### Check\n",
"\n",
"Examine retrieval; we get back images that are relevant to our question."
2023-11-07 17:10:24 +00:00
]
},
{
"cell_type": "code",
2023-11-16 18:34:13 +00:00
"execution_count": 48,
"id": "9f4695c6-7374-4284-b2fe-a94ac17b630f",
2023-11-07 17:10:24 +00:00
"metadata": {},
2023-11-16 18:34:13 +00:00
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
2023-11-07 17:10:24 +00:00
"source": [
2023-11-16 18:34:13 +00:00
"# Check retrieval\n",
"query = \"Give me company names that are interesting investments based on EV / NTM and NTM rev growth. Consider EV / NTM multiples vs historical?\"\n",
"docs = retriever_multi_vector_img.get_relevant_documents(query, limit=6)\n",
"\n",
"# We get 4 docs\n",
"len(docs)"
2023-11-07 17:10:24 +00:00
]
},
{
2023-11-16 18:34:13 +00:00
"cell_type": "code",
"execution_count": 57,
"id": "b7a2b0e0-87eb-4e1b-a3f0-067cbf288ef6",
2023-11-07 17:10:24 +00:00
"metadata": {},
2023-11-16 18:34:13 +00:00
"outputs": [
{
"data": {
"text/plain": [
"4"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
2023-11-07 17:10:24 +00:00
"source": [
2023-11-16 18:34:13 +00:00
"# Check retrieval\n",
"query = \"What are the EV / NTM and NTM rev growth for MongoDB, Cloudflare, and Datadog?\"\n",
"docs = retriever_multi_vector_img.get_relevant_documents(query, limit=6)\n",
2023-11-07 17:10:24 +00:00
"\n",
2023-11-16 18:34:13 +00:00
"# We get 4 docs\n",
"len(docs)"
2023-11-07 17:10:24 +00:00
]
},
{
"cell_type": "code",
2023-11-16 18:34:13 +00:00
"execution_count": 61,
"id": "94c74413-9dd7-4337-bdca-05e9ee151f27",
2023-11-07 17:10:24 +00:00
"metadata": {},
2023-11-16 18:34:13 +00:00
"outputs": [
{
"data": {
"text/html": [
"<img src=\"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
2023-11-07 17:10:24 +00:00
"source": [
2023-11-16 18:34:13 +00:00
"# We get back relevant images\n",
"plt_img_base64(docs[0])"
]
},
{
"cell_type": "markdown",
"id": "2bdfd863-c756-4cb4-b7be-ea00284687d2",
"metadata": {},
"source": [
"### Sanity Check\n",
"\n",
"Why does this work? Let's look back at the image that we stored ..."
2023-11-07 17:10:24 +00:00
]
},
{
"cell_type": "code",
2023-11-16 18:34:13 +00:00
"execution_count": 45,
"id": "bab422d9-104f-4e47-9760-96bdfdd3a9cf",
2023-11-07 17:10:24 +00:00
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
2023-11-16 18:34:13 +00:00
"<img src=\"
2023-11-07 17:10:24 +00:00
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
2023-11-16 18:34:13 +00:00
"plt_img_base64(img_base64_list[3])"
2023-11-07 17:10:24 +00:00
]
},
{
"cell_type": "markdown",
2023-11-16 18:34:13 +00:00
"id": "3e3afd3b-4482-49af-995a-083a8af8eb57",
2023-11-07 17:10:24 +00:00
"metadata": {},
"source": [
2023-11-16 18:34:13 +00:00
"... here is the corresponding summary, which we embedded and used in similarity search.\n",
2023-11-07 17:10:24 +00:00
"\n",
2023-11-16 18:34:13 +00:00
"It's pretty reasonable that this image is indeed retrieved from our `query` based on it's similarity to this summary."
2023-11-07 17:10:24 +00:00
]
},
{
"cell_type": "code",
2023-11-16 18:34:13 +00:00
"execution_count": 44,
"id": "bfb944f3-712b-4bdb-9396-6d4afdc62af4",
2023-11-07 17:10:24 +00:00
"metadata": {},
2023-11-16 18:34:13 +00:00
"outputs": [
{
"data": {
"text/plain": [
"'The image is a data table comparing key financial metrics of ten technology companies. Metrics include Enterprise Value to Next Twelve Months Revenue (EV/NTM Rev), EV to 2024 Revenue (EV/2024 Rev), EV to NTM Free Cash Flow (EV/NTM FCF), NTM Revenue Growth, Gross Margin, Operating Margin, Free Cash Flow Margin (FCF Margin), and the percentage in Top 10 Multiple Last Twelve Months (LTM). The table lists averages and medians for these metrics, including an overall median for reference. It features the logo of Altimeter and the watermark \"@jaminball\" at the bottom.'"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
2023-11-07 17:10:24 +00:00
"source": [
2023-11-16 18:34:13 +00:00
"image_summaries[3]"
]
},
{
"cell_type": "markdown",
"id": "a60c457c-a675-4689-a6c4-a843f28a9c23",
"metadata": {},
"source": [
"### RAG\n",
2023-11-07 17:10:24 +00:00
"\n",
2023-11-16 18:34:13 +00:00
"Now let's run RAG and test the ability to synthesize an answer to our question."
2023-11-07 17:10:24 +00:00
]
},
{
"cell_type": "code",
2023-11-16 18:34:13 +00:00
"execution_count": 62,
"id": "9c64b19e-5a89-4dda-af38-fcc4a36a1b44",
2023-11-07 17:10:24 +00:00
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
2023-11-16 18:34:13 +00:00
"\"Based on the first image provided, which appears to be a table of financial metrics for various companies, we can extract the following information for MongoDB, Cloudflare, and Datadog:\\n\\nMongoDB:\\n- EV / NTM Rev: 14.6x\\n- NTM Rev Growth: 17%\\n\\nCloudflare:\\n- EV / NTM Rev: 13.4x\\n- NTM Rev Growth: 28%\\n\\nDatadog:\\n- EV / NTM Rev: 13.1x\\n- NTM Rev Growth: 19%\\n\\nThese figures represent the enterprise value to next twelve months' revenue (EV / NTM Rev) multiple and the projected revenue growth for the next twelve months (NTM Rev Growth) for each company. These metrics are often used by investors to assess the valuation and growth prospects of companies, particularly in the technology sector.\""
2023-11-07 17:10:24 +00:00
]
},
2023-11-16 18:34:13 +00:00
"execution_count": 62,
2023-11-07 17:10:24 +00:00
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
2023-11-16 18:34:13 +00:00
"# Run RAG chain\n",
"chain_multimodal_rag.invoke(query)"
2023-11-07 17:10:24 +00:00
]
},
{
"cell_type": "markdown",
"id": "dea241f1-bd11-45cb-bb33-c4e2e8286855",
"metadata": {},
"source": [
2023-11-16 18:34:13 +00:00
"Here is the trace where we can see what is passed to the LLM:\n",
" \n",
"* Question 1 [Trace focused on investment advice](https://smith.langchain.com/public/d77b7b52-4128-4772-82a7-c56eb97e8b97/r)\n",
"* Question 2 [Trace focused on table extraction](https://smith.langchain.com/public/4624f086-1bd7-4284-9ca9-52fd7e7a4568/r)\n",
"\n",
"For question 1, we can see that we pass 3 images along with a text chunk:"
]
},
{
"attachments": {
"2f72d65f-e9b5-4e2e-840a-8d111792d20b.png": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAACSAAAARWCAYAAAAhA46+AAAMQGlDQ1BJQ0MgUHJvZmlsZQAASImVVwdYU8kWnluSkJAQIICAlNCbIFIDSAmhBZBeBBshCRBKjIGgYkcXFVy7iIANXRVR7IDYETuLYu+LBRVlXSzYlTcpoOu+8r35vrnz33/O/OfMuTP33gGAfpwnkeSimgDkiQukcaGBzNEpqUzSU0AEdEAFVkCLx8+XsGNiIgEsA+3fy7vrAJG3VxzlWv/s/69FSyDM5wOAxECcLsjn50G8HwC8mi+RFgBAlPMWkwskcgwr0JHCACFeIMeZSlwtx+lKvFthkxDHgbgVADUqjyfNBEDjEuSZhfxMqKHRC7GzWCASA0BnQuyXlzdRAHEaxLbQRgKxXJ+V/oNO5t800wc1ebzMQayci6KoBYnyJbm8qf9nOv53ycuVDfiwhpWaJQ2Lk88Z5u1mzsQIOaZC3CNOj4qGWBviDyKBwh5ilJIlC0tU2qNG/HwOzBnQg9hZwAuKgNgI4hBxblSkik/PEIVwIYYrBJ0iKuAmQKwP8QJhfnC8ymaDdGKcyhfakCHlsFX8WZ5U4Vfu674sJ5Gt0n+dJeSq9DGNoqyEZIgpEFsWipKiINaA2Ck/Jz5CZTOyKIsTNWAjlcXJ47eEOE4oDg1U6mOFGdKQOJV9aV7+wHyxDVkibpQK7y3ISghT5gdr5fMU8cO5YJeEYnbigI4wf3TkwFwEwqBg5dyxZ0JxYrxK54OkIDBOORanSHJjVPa4uTA3VM6bQ+yWXxivGosnFcAFqdTHMyQFMQnKOPGibF54jDIefCmIBBwQBJhABms6mAiygai9p7EH3il7QgAPSEEmEAJHFTMwIlnRI4bXeFAE/oRICPIHxwUqeoWgEPJfB1nl1RFkKHoLFSNywBOI80AEyIX3MsUo8aC3JPAYMqJ/eOfByofx5sIq7//3/AD7nWFDJlLFyAY8MukDlsRgYhAxjBhCtMMNcT/cB4+E1wBYXXAW7jUwj+/2hCeEDsJDwjVCJ+HWBFGx9KcoR4FOqB+iykX6j7nAraGmOx6I+0J1qIzr4YbAEXeDfti4P/TsDlmOKm55Vpg/af9tBj88DZUd2ZmMkoeQA8i2P4/UsNdwH1SR5/rH/ChjTR/MN2ew52f/nB+yL4BtxM+W2AJsH3YGO4Gdww5jjYCJHcOasDbsiBwPrq7HitU14C1OEU8O1BH9w9/Ak5VnMt+5zrnb+Yuyr0A4Rf6OBpyJkqlSUWZWAZMNvwhCJlfMdxrGdHF2cQVA/n1Rvr7exCq+G4he23du7h8A+B7r7+8/9J0LPwbAHk+4/Q9+52xZ8NOhDsDZg3yZtFDJ4fILAb4l6HCnGQATYAFs4XxcgAfwAQEgGISDaJAAUsB4GH0WXOdSMBlMB3NACSgDS8EqUAnWg01gG9gJ9oJGcBicAKfBBXAJXAN34OrpAi9AL3gHPiMIQkJoCAMxQEwRK8QBcUFYiB8SjEQicUgKkoZkImJEhkxH5iJlyHKkEtmI1CJ7kIPICeQc0oHcQh4g3chr5BOKoVRUBzVGrdHhKAtloxFoAjoOzUQnoUXoPHQxWoHWoDvQBvQEegG9hnaiL9A+DGDqmB5mhjliLIyDRWOpWAYmxWZipVg5VoPVY83wOV/BOrEe7CNOxBk4E3eEKzgMT8T5+CR8Jr4Ir8S34Q14K34Ff4D34t8INIIRwYHgTeASRhMyCZMJJYRywhbCAcIpuJe6CO+IRKIe0YboCfdiCjGbOI24iLiWuIt4nNhBfETsI5FIBiQHki8pmsQjFZBKSGtIO0jHSJdJXaQPaupqpmouaiFqqWpitWK1crXtakfVLqs9VftM1iRbkb3J0WQBeSp5CXkzuZl8kdxF/kzRothQfCkJlGzKHEoFpZ5yinKX8kZdXd1c3Us9Vl2kPlu9Qn23+ln1B+ofqdpUeyqHOpYqoy6mbqUep96ivqHRaNa0AFoqrYC2mFZLO0m7T/ugwdBw0uBqCDRmaVRpNGhc1nhJJ9Ot6Gz6eHoRvZy+j36R3qNJ1rTW5GjyNGdqVmke1Lyh2afF0BqhFa2Vp7VIa7vWOa1n2iRta+1gbYH2PO1N2ie1HzEwhgWDw+Az5jI2M04xunSIOjY6XJ1snTKdnTrtOr262rpuukm6U3SrdI/oduphetZ6XL1cvSV6e/Wu630aYjyEPUQ4ZOGQ+iGXh7zXH6ofoC/UL9XfpX9N/5MB0yDYIMdgmUGjwT1D3NDeMNZwsuE6w1OGPUN1hvoM5Q8tHbp36G0j1MjeKM5omtEmozajPmMT41BjifEa45PGPSZ6JgEm2SYrTY6adJsyTP1MRaYrTY+ZPmfqMtnMXGYFs5XZa2ZkFmYmM9to1m722dzGPNG82HyX+T0LigXLIsNipUWLRa+lqeUoy+mWdZa3rchWLKssq9VWZ6zeW9tYJ1vPt260fmajb8O1KbKps7lrS7P1t51kW2N71Y5ox7LLsVtrd8ketXe3z7Kvsr/ogDp4OIgc1jp0DCMM8xomHlYz7IYj1ZHtWOhY5/jASc8p0qnYqdHp5XDL4anDlw0/M/ybs7tzrvNm5zsjtEeEjyge0TzitYu9C9+lyuWqK801xHWWa5PrKzcHN6HbOreb7gz3Ue7z3Vvcv3p4ekg96j26PS090zyrPW+wdFgxrEWss14Er0CvWV6HvT56e3gXeO/1/svH0SfHZ7vPs5E2I4UjN4985Gvuy/Pd6Nvpx/RL89vg1+lv5s/zr/F/GGARIAjYEvCUbcfOZu9gvwx0DpQGHgh8z/HmzOAcD8KCQoNKg9qDtYMTgyuD74eYh2SG1IX0hrqHTgs9HkYIiwhbFnaDa8zlc2u5veGe4TPCWyOoEfERlREPI+0jpZHNo9BR4aNWjLobZRUljmqMBtHc6BXR92JsYibFHIolxsbEVsU+iRsRNz3uTDwjfkL89vh3CYEJSxLuJNomyhJbkuhJY5Nqk94nByUvT+4cPXz0jNEXUgxTRClNqaTUpNQtqX1jgsesGtM11n1sydjr42zGTRl3brzh+NzxRybQJ/Am7EsjpCWnbU/7wovm1fD60rnp1em9fA5/Nf+FIECwUtAt9BUuFz7N8M1YnvEs0zdzRWZ3ln9WeVaPiCOqFL3KDsten/0+Jzpna05/bnLurjy1vLS8g2JtcY64daLJxCkTOyQOkhJJ5yTvSasm9UojpFvykfxx+U0FOvBHvk1mK/tF9qDQr7Cq8MPkpMn7pmhNEU9pm2o/deHUp0UhRb9Nw6fxp7VMN5s+Z/qDGewZG2ciM9NntsyymDVvVtfs0Nnb5lDm5Mz5vdi5eHnx27nJc5vnGc+bPe/RL6G/1JVolEhLbsz3mb9+Ab5AtKB9oevCNQu/lQpKz5c5l5WXfVnEX3T+1xG/VvzavzhjcfsSjyXrlhKXipdeX+a/bNtyreVFyx+tGLWiYSVzZenKt6smrDpX7la+fjVltWx1Z0VkRdMayzVL13ypzKq8VhVYtavaqHph9fu1grWX1wWsq19vvL5s/acNog03N4ZubKixrinfRNxUuOnJ5qTNZ35j/Va7xXBL2ZavW8VbO7fFbWut9ayt3W60fUkdWier694xdselnUE7m+od6zfu0ttVthvslu1+vidtz/W9EXtb9rH21e+32l99gHGgtAFpmNrQ25jV2NmU0tRxMPxgS7NP84FDToe2HjY7XHVE98iSo5Sj8472Hys61ndccrznROaJRy0TWu6cHH3yamtsa/upiFNnT4ecPnmGfebYWd+zh895nzt4nnW+8YLHhYY297YDv7v/fqDdo73houfFpktel5o7RnYcvex/+cSVoCunr3KvXrgWda3jeuL1mzfG3ui8Kbj57FburVe3C29/vjP7LuFu6T3Ne+X3je7X/GH3x65Oj84jD4IetD2Mf3jnEf/Ri8f5j790zXtCe1L+1PRp7TOXZ4e7Q7ovPR/zvOuF5MXnnpI/tf6sfmn7cv9fAX+19Y7u7XolfdX/etEbgzdb37q9bemL6bv/Lu/d5/elHww+bPvI+njmU/Knp58nfyF9qfhq97X5
}
},
"cell_type": "markdown",
"id": "2352cfc7-ef05-4257-87f5-32ee0d89ef12",
"metadata": {},
"source": [
"![trace.png](attachment:2f72d65f-e9b5-4e2e-840a-8d111792d20b.png)"
]
},
{
"cell_type": "markdown",
"id": "857e6c08-8798-4159-b9d1-af2f048448b2",
"metadata": {},
"source": [
"### Considerations\n",
"\n",
"**Retrieval**\n",
" \n",
"* Retrieval is performed based upon similarity to image summaries as well as text chunks.\n",
"* This requires some careful consideration because image retrieval can fail if there are competing text chunks.\n",
"* To mitigate this, I produce larger (4k token) text chunks and summarize them for retrieval.\n",
"\n",
"**Image Size**\n",
"\n",
"* The quality of answer synthesis appears to be sensitive to image size, [as expected](https://platform.openai.com/docs/guides/vision).\n",
"* I'll do evals soon to test this more carefully."
2023-11-07 17:10:24 +00:00
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.16"
}
},
"nbformat": 4,
"nbformat_minor": 5
}