langchain/cookbook/Semi_structured_and_multi_m...

{
 "cells": [
  {
   "attachments": {
    "eea9bc1b-828e-4598-b37b-3c2d92eebf7f.png": {
     "image/png": "iVBORw0KGgoAAAANSUhEUgAABpgAAAJ+CAYAAACn9QPUAAAMQGlDQ1BJQ0MgUHJvZmlsZQAASImVVwdYU8kWnluSkJAQIICAlNCbIFIDSAmhBZBeBBshCRBKjIGgYkcXFVy7iIANXRVR7IDYETuLYu+LBRVlXSzYlTcpoOu+8r35vrnz33/O/OfMuTP33gGAfpwnkeSimgDkiQukcaGBzNEpqUzSU0AEdEAFVkCLx8+XsGNiIgEsA+3fy7vrAJG3VxzlWv/s/69FSyDM5wOAxECcLsjn50G8HwC8mi+RFgBAlPMWkwskcgwr0JHCACFeIMeZSlwtx+lKvFthkxDHgbgVADUqjyfNBEDjEuSZhfxMqKHRC7GzWCASA0BnQuyXlzdRAHEaxLbQRgKxXJ+V/oNO5t800wc1ebzMQayci6KoBYnyJbm8qf9nOv53ycuVDfiwhpWaJQ2Lk88Z5u1mzsQIOaZC3CNOj4qGWBviDyKBwh5ilJIlC0tU2qNG/HwOzBnQg9hZwAuKgNgI4hBxblSkik/PEIVwIYYrBJ0iKuAmQKwP8QJhfnC8ymaDdGKcyhfakCHlsFX8WZ5U4Vfu674sJ5Gt0n+dJeSq9DGNoqyEZIgpEFsWipKiINaA2Ck/Jz5CZTOyKIsTNWAjlcXJ47eEOE4oDg1U6mOFGdKQOJV9aV7+wHyxDVkibpQK7y3ISghT5gdr5fMU8cO5YJeEYnbigI4wf3TkwFwEwqBg5dyxZ0JxYrxK54OkIDBOORanSHJjVPa4uTA3VM6bQ+yWXxivGosnFcAFqdTHMyQFMQnKOPGibF54jDIefCmIBBwQBJhABms6mAiygai9p7EH3il7QgAPSEEmEAJHFTMwIlnRI4bXeFAE/oRICPIHxwUqeoWgEPJfB1nl1RFkKHoLFSNywBOI80AEyIX3MsUo8aC3JPAYMqJ/eOfByofx5sIq7//3/AD7nWFDJlLFyAY8MukDlsRgYhAxjBhCtMMNcT/cB4+E1wBYXXAW7jUwj+/2hCeEDsJDwjVCJ+HWBFGx9KcoR4FOqB+iykX6j7nAraGmOx6I+0J1qIzr4YbAEXeDfti4P/TsDlmOKm55Vpg/af9tBj88DZUd2ZmMkoeQA8i2P4/UsNdwH1SR5/rH/ChjTR/MN2ew52f/nB+yL4BtxM+W2AJsH3YGO4Gdww5jjYCJHcOasDbsiBwPrq7HitU14C1OEU8O1BH9w9/Ak5VnMt+5zrnb+Yuyr0A4Rf6OBpyJkqlSUWZWAZMNvwhCJlfMdxrGdHF2cQVA/n1Rvr7exCq+G4he23du7h8A+B7r7+8/9J0LPwbAHk+4/Q9+52xZ8NOhDsDZg3yZtFDJ4fILAb4l6HCnGQATYAFs4XxcgAfwAQEgGISDaJAAUsB4GH0WXOdSMBlMB3NACSgDS8EqUAnWg01gG9gJ9oJGcBicAKfBBXAJXAN34OrpAi9AL3gHPiMIQkJoCAMxQEwRK8QBcUFYiB8SjEQicUgKkoZkImJEhkxH5iJlyHKkEtmI1CJ7kIPICeQc0oHcQh4g3chr5BOKoVRUBzVGrdHhKAtloxFoAjoOzUQnoUXoPHQxWoHWoDvQBvQEegG9hnaiL9A+DGDqmB5mhjliLIyDRWOpWAYmxWZipVg5VoPVY83wOV/BOrEe7CNOxBk4E3eEKzgMT8T5+CR8Jr4Ir8S34Q14K34Ff4D34t8INIIRwYHgTeASRhMyCZMJJYRywhbCAcIpuJe6CO+IRKIe0YboCfdiCjGbOI24iLiWuIt4nNhBfETsI5FIBiQHki8pmsQjFZBKSGtIO0jHSJdJXaQPaupqpmouaiFqqWpitWK1crXtakfVLqs9VftM1iRbkb3J0WQBeSp5CXkzuZl8kdxF/kzRothQfCkJlGzKHEoFpZ5yinKX8kZdXd1c3Us9Vl2kPlu9Qn23+ln1B+ofqdpUeyqHOpYqoy6mbqUep96ivqHRaNa0AFoqrYC2mFZLO0m7T/ugwdBw0uBqCDRmaVRpNGhc1nhJJ9Ot6Gz6eHoRvZy+j36R3qNJ1rTW5GjyNGdqVmke1Lyh2afF0BqhFa2Vp7VIa7vWOa1n2iRta+1gbYH2PO1N2ie1HzEwhgWDw+Az5jI2M04xunSIOjY6XJ1snTKdnTrtOr262rpuukm6U3SrdI/oduphetZ6XL1cvSV6e/Wu630aYjyEPUQ4ZOGQ+iGXh7zXH6ofoC/UL9XfpX9N/5MB0yDYIMdgmUGjwT1D3NDeMNZwsuE6w1OGPUN1hvoM5Q8tHbp36G0j1MjeKM5omtEmozajPmMT41BjifEa45PGPSZ6JgEm2SYrTY6adJsyTP1MRaYrTY+ZPmfqMtnMXGYFs5XZa2ZkFmYmM9to1m722dzGPNG82HyX+T0LigXLIsNipUWLRa+lqeUoy+mWdZa3rchWLKssq9VWZ6zeW9tYJ1vPt260fmajb8O1KbKps7lrS7P1t51kW2N71Y5ox7LLsVtrd8ketXe3z7Kvsr/ogDp4OIgc1jp0DCMM8xomHlYz7IYj1ZHtWOhY5/jASc8p0qnYqdHp5XDL4anDlw0/M/ybs7tzrvNm5zsjtEeEjyge0TzitYu9C9+lyuWqK801xHWWa5PrKzcHN6HbOreb7gz3Ue7z3Vvcv3p4ekg96j26PS090zyrPW+wdFgxrEWss14Er0CvWV6HvT56e3gXeO/1/svH0SfHZ7vPs5E2I4UjN4985Gvuy/Pd6Nvpx/RL89vg1+lv5s/zr/F/GGARIAjYEvCUbcfOZu9gvwx0DpQGHgh8z/HmzOAcD8KCQoNKg9qDtYMTgyuD74eYh2SG1IX0hrqHTgs9HkYIiwhbFnaDa8zlc2u5veGe4TPCWyOoEfERlREPI+0jpZHNo9BR4aNWjLobZRUljmqMBtHc6BXR92JsYibFHIolxsbEVsU+iRsRNz3uTDwjfkL89vh3CYEJSxLuJNomyhJbkuhJY5Nqk94nByUvT+4cPXz0jNEXUgxTRClNqaTUpNQtqX1jgsesGtM11n1sydjr42zGTRl3brzh+NzxRybQJ/Am7EsjpCWnbU/7wovm1fD60rnp1em9fA5/Nf+FIECwUtAt9BUuFz7N8M1YnvEs0zdzRWZ3ln9WeVaPiCOqFL3KDsten/0+Jzpna05/bnLurjy1vLS8g2JtcY64daLJxCkTOyQOkhJJ5yTvSasm9UojpFvykfxx+U0FOvBHvk1mK/tF9qDQr7Cq8MPkpMn7pmhNEU9pm2o/deHUp0UhRb9Nw6fxp7VMN5s+Z/qDGewZG2ciM9NntsyymDVvVtfs0Nnb5lDm5Mz5vdi5eHnx27nJc5vnGc+bPe/RL6G/1JVolEhLbsz3mb9+Ab5AtKB9oevCNQu/lQpKz5c5l5WXfVnEX3T+1xG/VvzavzhjcfsSjyXrlhKXipdeX+a/bNtyreVFyx+tGLWiYSVzZenKt6smrDpX7la+fjVltWx1Z0VkRdMayzVL13ypzKq8VhVYtavaqHph9fu1grWX1wWsq19vvL5s/acNog03N4ZubKixrinfRNxUuOnJ5qTNZ35j/Va7xXBL2ZavW8VbO7fFbWut9ayt3W60fUkdWier694xdselnUE7m+od6zfu0ttVthvslu1+vidtz/W9EXtb9rH21e+32l99gHGgtAFpmNrQ25jV2NmU0tRxMPxgS7NP84FDToe2HjY7XHVE98iSo5Sj8472Hys61ndccrznROaJRy0TWu6cHH3yamtsa/upiFNnT4ecPnmGfebYWd+zh895nzt4nnW+8YLHhYY297YDv7v/fqDdo73houfFpktel5o7RnYcvex/+cSVoCunr3KvXrgWda3jeuL1mzfG3ui8Kbj57FburVe3C29/vjP7LuFu6T3Ne+X3je7X/GH3x65Oj84jD4IetD2Mf3jnEf/Ri8f5j790zXtCe1L+1PRp7TOXZ4e7Q7ovPR/zvOuF5MXnnpI/tf6sfmn7cv9fAX+19Y7u7XolfdX/etEbgzdb37q9bemL6bv/Lu/d5/elHww+bPvI+njmU/Knp58nfyF9qfhq97X5
    }
   },
   "cell_type": "markdown",
   "id": "812a4dbc-fe04-4b84-bdf9-390045e30806",
   "metadata": {},
   "source": [
    "## Semi-structured and Multi-modal RAG\n",
    "\n",
    "Many documents contain a mixture of content types, including text, tables, and images. \n",
    "\n",
    "This cookbook shows how to perform RAG on documents with semi-structured data along with images: \n",
    "\n",
    "* We will use [Unstructured](https://unstructured.io/) to parse images, text, and tables from documents (PDFs).\n",
    "* We will use the [multi-vector retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/multi_vector) to store raw tables, text, (optionally) images along with summaries for retrieval.\n",
    "* We will use [LCEL](https://python.langchain.com/docs/expression_language/) to implement the chains used.\n",
    "\n",
    "We will also introduce two different approaches for working with images.\n",
    "\n",
    "![ss_mm_RAG.png](attachment:eea9bc1b-828e-4598-b37b-3c2d92eebf7f.png)\n",
    "\n",
    "## Packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "140580ef-5db0-43cc-a524-9c39e04d4df0",
   "metadata": {},
   "outputs": [],
   "source": [
    "! pip install langchain unstructured[all-docs] pydantic lxml"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74b56bde-1ba0-4525-a11d-cab02c5659e4",
   "metadata": {},
   "source": [
    "## Data Loading\n",
    "\n",
    "### Partition PDF tables, text, and images\n",
    "  \n",
    "* `LLaVA` Paper: https://arxiv.org/pdf/2304.08485.pdf\n",
    "* Use `Unstructured` to partition elements"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "61cbb874-ecc0-4d5d-9954-f0a41f65e0d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "path = \"/Users/rlm/Desktop/Papers/LLaVA/\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e98bdeb7-eb77-42e6-a3a5-c3f27a1838d5",
   "metadata": {},
   "outputs": [],
   "source": [
    "from lxml import html\n",
    "from pydantic import BaseModel\n",
    "from typing import Any, Optional\n",
    "from unstructured.partition.pdf import partition_pdf\n",
    "\n",
    "# Get elements\n",
    "raw_pdf_elements = partition_pdf(filename=path+\"LLaVA.pdf\",\n",
    "                                 # Using pdf format to find embedded image blocks\n",
    "                                 extract_images_in_pdf=True,\n",
    "                                 # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles\n",
    "                                 # Titles are any sub-section of the document \n",
    "                                 infer_table_structure=True, \n",
    "                                 # Post processing to aggregate text once we have the title \n",
    "                                 chunking_strategy=\"by_title\",\n",
    "                                 # Chunking params to aggregate text blocks\n",
    "                                 # Attempt to create a new chunk 3800 chars\n",
    "                                 # Attempt to keep chunks > 2000 chars \n",
    "                                 # Hard max on chunks\n",
    "                                 max_characters=4000, \n",
    "                                 new_after_n_chars=3800, \n",
    "                                 combine_text_under_n_chars=2000,\n",
    "                                 image_output_dir_path=path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "7cdba921-5419-4471-b234-d93af3859b6f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{\"<class 'unstructured.documents.elements.CompositeElement'>\": 31,\n",
       " \"<class 'unstructured.documents.elements.Table'>\": 3}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a dictionary to store counts of each type\n",
    "category_counts = {}\n",
    "\n",
    "for element in raw_pdf_elements:\n",
    "    category = str(type(element))\n",
    "    if category in category_counts:\n",
    "        category_counts[category] += 1\n",
    "    else:\n",
    "        category_counts[category] = 1\n",
    "\n",
    "# Unique_categories will have unique elements\n",
    "unique_categories = set(category_counts.keys())\n",
    "category_counts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "5f660305-e165-4b6c-ada3-a67a422defb5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3\n",
      "31\n"
     ]
    }
   ],
   "source": [
    "class Element(BaseModel):\n",
    "    type: str\n",
    "    text: Any\n",
    "\n",
    "# Categorize by type\n",
    "categorized_elements = []\n",
    "for element in raw_pdf_elements:\n",
    "    if \"unstructured.documents.elements.Table\" in str(type(element)):\n",
    "        categorized_elements.append(Element(type=\"table\", text=str(element)))\n",
    "    elif \"unstructured.documents.elements.CompositeElement\" in str(type(element)):\n",
    "        categorized_elements.append(Element(type=\"text\", text=str(element)))\n",
    "\n",
    "# Tables\n",
    "table_elements = [e for e in categorized_elements if e.type == \"table\"]\n",
    "print(len(table_elements))\n",
    "\n",
    "# Text\n",
    "text_elements = [e for e in categorized_elements if e.type == \"text\"]\n",
    "print(len(text_elements))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0aa7f52f-bf5c-4ba4-af72-b2ccba59a4cf",
   "metadata": {},
   "source": [
    "## Multi-vector retriever\n",
    "\n",
    "Use [multi-vector-retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/multi_vector#summary).\n",
    "\n",
    "Summaries are used to retrieve raw tables and / or raw chunks of text.\n",
    "\n",
    "### Text and Table summaries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "523e6ed2-2132-4748-bdb7-db765f20648d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain.prompts import ChatPromptTemplate\n",
    "from langchain.schema.output_parser import StrOutputParser"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "22c22e3f-42fb-4a4a-a87a-89f10ba8ab99",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prompt \n",
    "prompt_text=\"\"\"You are an assistant tasked with summarizing tables and text. \\ \n",
    "Give a concise summary of the table or text. Table or text chunk: {element} \"\"\"\n",
    "prompt = ChatPromptTemplate.from_template(prompt_text) \n",
    "\n",
    "# Summary chain \n",
    "model = ChatOpenAI(temperature=0,model=\"gpt-4\")\n",
    "summarize_chain = {\"element\": lambda x:x} | prompt | model | StrOutputParser()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "f176b374-aef0-48f4-a104-fb26b1dd6922",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Apply to text\n",
    "texts = [i.text for i in text_elements]\n",
    "text_summaries = summarize_chain.batch(texts, {\"max_concurrency\": 5})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "61a6ac00-ebbe-4608-9ae5-40f81541e37f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Apply to tables\n",
    "tables = [i.text for i in table_elements]\n",
    "table_summaries = summarize_chain.batch(tables, {\"max_concurrency\": 5})"
   ]
  },
  {
   "attachments": {
    "ef731a34-f227-459b-9d8a-ecccb27a348e.png": {
     "image/png": "iVBORw0KGgoAAAANSUhEUgAABrEAAAJwCAYAAAAut0q+AAAMQGlDQ1BJQ0MgUHJvZmlsZQAASImVVwdYU8kWnluSkJAQIICAlNCbIFIDSAmhBZBeBBshCRBKjIGgYkcXFVy7iIANXRVR7IDYETuLYu+LBRVlXSzYlTcpoOu+8r35vrnz33/O/OfMuTP33gGAfpwnkeSimgDkiQukcaGBzNEpqUzSU0AEdEAFVkCLx8+XsGNiIgEsA+3fy7vrAJG3VxzlWv/s/69FSyDM5wOAxECcLsjn50G8HwC8mi+RFgBAlPMWkwskcgwr0JHCACFeIMeZSlwtx+lKvFthkxDHgbgVADUqjyfNBEDjEuSZhfxMqKHRC7GzWCASA0BnQuyXlzdRAHEaxLbQRgKxXJ+V/oNO5t800wc1ebzMQayci6KoBYnyJbm8qf9nOv53ycuVDfiwhpWaJQ2Lk88Z5u1mzsQIOaZC3CNOj4qGWBviDyKBwh5ilJIlC0tU2qNG/HwOzBnQg9hZwAuKgNgI4hBxblSkik/PEIVwIYYrBJ0iKuAmQKwP8QJhfnC8ymaDdGKcyhfakCHlsFX8WZ5U4Vfu674sJ5Gt0n+dJeSq9DGNoqyEZIgpEFsWipKiINaA2Ck/Jz5CZTOyKIsTNWAjlcXJ47eEOE4oDg1U6mOFGdKQOJV9aV7+wHyxDVkibpQK7y3ISghT5gdr5fMU8cO5YJeEYnbigI4wf3TkwFwEwqBg5dyxZ0JxYrxK54OkIDBOORanSHJjVPa4uTA3VM6bQ+yWXxivGosnFcAFqdTHMyQFMQnKOPGibF54jDIefCmIBBwQBJhABms6mAiygai9p7EH3il7QgAPSEEmEAJHFTMwIlnRI4bXeFAE/oRICPIHxwUqeoWgEPJfB1nl1RFkKHoLFSNywBOI80AEyIX3MsUo8aC3JPAYMqJ/eOfByofx5sIq7//3/AD7nWFDJlLFyAY8MukDlsRgYhAxjBhCtMMNcT/cB4+E1wBYXXAW7jUwj+/2hCeEDsJDwjVCJ+HWBFGx9KcoR4FOqB+iykX6j7nAraGmOx6I+0J1qIzr4YbAEXeDfti4P/TsDlmOKm55Vpg/af9tBj88DZUd2ZmMkoeQA8i2P4/UsNdwH1SR5/rH/ChjTR/MN2ew52f/nB+yL4BtxM+W2AJsH3YGO4Gdww5jjYCJHcOasDbsiBwPrq7HitU14C1OEU8O1BH9w9/Ak5VnMt+5zrnb+Yuyr0A4Rf6OBpyJkqlSUWZWAZMNvwhCJlfMdxrGdHF2cQVA/n1Rvr7exCq+G4he23du7h8A+B7r7+8/9J0LPwbAHk+4/Q9+52xZ8NOhDsDZg3yZtFDJ4fILAb4l6HCnGQATYAFs4XxcgAfwAQEgGISDaJAAUsB4GH0WXOdSMBlMB3NACSgDS8EqUAnWg01gG9gJ9oJGcBicAKfBBXAJXAN34OrpAi9AL3gHPiMIQkJoCAMxQEwRK8QBcUFYiB8SjEQicUgKkoZkImJEhkxH5iJlyHKkEtmI1CJ7kIPICeQc0oHcQh4g3chr5BOKoVRUBzVGrdHhKAtloxFoAjoOzUQnoUXoPHQxWoHWoDvQBvQEegG9hnaiL9A+DGDqmB5mhjliLIyDRWOpWAYmxWZipVg5VoPVY83wOV/BOrEe7CNOxBk4E3eEKzgMT8T5+CR8Jr4Ir8S34Q14K34Ff4D34t8INIIRwYHgTeASRhMyCZMJJYRywhbCAcIpuJe6CO+IRKIe0YboCfdiCjGbOI24iLiWuIt4nNhBfETsI5FIBiQHki8pmsQjFZBKSGtIO0jHSJdJXaQPaupqpmouaiFqqWpitWK1crXtakfVLqs9VftM1iRbkb3J0WQBeSp5CXkzuZl8kdxF/kzRothQfCkJlGzKHEoFpZ5yinKX8kZdXd1c3Us9Vl2kPlu9Qn23+ln1B+ofqdpUeyqHOpYqoy6mbqUep96ivqHRaNa0AFoqrYC2mFZLO0m7T/ugwdBw0uBqCDRmaVRpNGhc1nhJJ9Ot6Gz6eHoRvZy+j36R3qNJ1rTW5GjyNGdqVmke1Lyh2afF0BqhFa2Vp7VIa7vWOa1n2iRta+1gbYH2PO1N2ie1HzEwhgWDw+Az5jI2M04xunSIOjY6XJ1snTKdnTrtOr262rpuukm6U3SrdI/oduphetZ6XL1cvSV6e/Wu630aYjyEPUQ4ZOGQ+iGXh7zXH6ofoC/UL9XfpX9N/5MB0yDYIMdgmUGjwT1D3NDeMNZwsuE6w1OGPUN1hvoM5Q8tHbp36G0j1MjeKM5omtEmozajPmMT41BjifEa45PGPSZ6JgEm2SYrTY6adJsyTP1MRaYrTY+ZPmfqMtnMXGYFs5XZa2ZkFmYmM9to1m722dzGPNG82HyX+T0LigXLIsNipUWLRa+lqeUoy+mWdZa3rchWLKssq9VWZ6zeW9tYJ1vPt260fmajb8O1KbKps7lrS7P1t51kW2N71Y5ox7LLsVtrd8ketXe3z7Kvsr/ogDp4OIgc1jp0DCMM8xomHlYz7IYj1ZHtWOhY5/jASc8p0qnYqdHp5XDL4anDlw0/M/ybs7tzrvNm5zsjtEeEjyge0TzitYu9C9+lyuWqK801xHWWa5PrKzcHN6HbOreb7gz3Ue7z3Vvcv3p4ekg96j26PS090zyrPW+wdFgxrEWss14Er0CvWV6HvT56e3gXeO/1/svH0SfHZ7vPs5E2I4UjN4985Gvuy/Pd6Nvpx/RL89vg1+lv5s/zr/F/GGARIAjYEvCUbcfOZu9gvwx0DpQGHgh8z/HmzOAcD8KCQoNKg9qDtYMTgyuD74eYh2SG1IX0hrqHTgs9HkYIiwhbFnaDa8zlc2u5veGe4TPCWyOoEfERlREPI+0jpZHNo9BR4aNWjLobZRUljmqMBtHc6BXR92JsYibFHIolxsbEVsU+iRsRNz3uTDwjfkL89vh3CYEJSxLuJNomyhJbkuhJY5Nqk94nByUvT+4cPXz0jNEXUgxTRClNqaTUpNQtqX1jgsesGtM11n1sydjr42zGTRl3brzh+NzxRybQJ/Am7EsjpCWnbU/7wovm1fD60rnp1em9fA5/Nf+FIECwUtAt9BUuFz7N8M1YnvEs0zdzRWZ3ln9WeVaPiCOqFL3KDsten/0+Jzpna05/bnLurjy1vLS8g2JtcY64daLJxCkTOyQOkhJJ5yTvSasm9UojpFvykfxx+U0FOvBHvk1mK/tF9qDQr7Cq8MPkpMn7pmhNEU9pm2o/deHUp0UhRb9Nw6fxp7VMN5s+Z/qDGewZG2ciM9NntsyymDVvVtfs0Nnb5lDm5Mz5vdi5eHnx27nJc5vnGc+bPe/RL6G/1JVolEhLbsz3mb9+Ab5AtKB9oevCNQu/lQpKz5c5l5WXfVnEX3T+1xG/VvzavzhjcfsSjyXrlhKXipdeX+a/bNtyreVFyx+tGLWiYSVzZenKt6smrDpX7la+fjVltWx1Z0VkRdMayzVL13ypzKq8VhVYtavaqHph9fu1grWX1wWsq19vvL5s/acNog03N4ZubKixrinfRNxUuOnJ5qTNZ35j/Va7xXBL2ZavW8VbO7fFbWut9ayt3W60fUkdWier694xdselnUE7m+od6zfu0ttVthvslu1+vidtz/W9EXtb9rH21e+32l99gHGgtAFpmNrQ25jV2NmU0tRxMPxgS7NP84FDToe2HjY7XHVE98iSo5Sj8472Hys61ndccrznROaJRy0TWu6cHH3yamtsa/upiFNnT4ecPnmGfebYWd+zh895nzt4nnW+8YLHhYY297YDv7v/fqDdo73houfFpktel5o7RnYcvex/+cSVoCunr3KvXrgWda3jeuL1mzfG3ui8Kbj57FburVe3C29/vjP7LuFu6T3Ne+X3je7X/GH3x65Oj84jD4IetD2Mf3jnEf/Ri8f5j790zXtCe1L+1PRp7TOXZ4e7Q7ovPR/zvOuF5MXnnpI/tf6sfmn7cv9fAX+19Y7u7XolfdX/etEbgzdb37q9bemL6bv/Lu/d5/elHww+bPvI+njmU/Knp58nfyF9qfhq97X5
    }
   },
   "cell_type": "markdown",
   "id": "b1feadda-8171-4aed-9a60-320a88dc9ee1",
   "metadata": {},
   "source": [
    "### Images\n",
    "\n",
    "There are at least three ways to add images to the RAG stack.\n",
    "\n",
    "`(1) Produce text summaries of images with a multi-modal LLM.`\n",
    "\n",
    "* 1. Embed, store, and retrieve text summaries like any other text chunk.\n",
    "\n",
    "`(2) Retrieve images.`\n",
    "\n",
    "* 2a. Text-embeddings: Produce text summaries of images, embed them, and use them to fetch the raw image in retrieval. \n",
    "* 2b. Multi-modal embeddings: Directly embed images along with text.\n",
    "\n",
    "Below we show 1, but future extensions will show the other paths. \n",
    "\n",
    "![img_rag.png](attachment:ef731a34-f227-459b-9d8a-ecccb27a348e.png)\n",
    "\n",
    "#### Image summaries \n",
    "\n",
    "We will use [LLaVA](https://github.com/haotian-liu/LLaVA/), an open source multi-modal model.\n",
    " \n",
    "We will use [llama.cpp](https://github.com/ggerganov/llama.cpp/pull/3436) to run LLaVA locally (e.g., on a Mac laptop):\n",
    "\n",
    "* Clone [llama.cpp](https://github.com/ggerganov/llama.cpp)\n",
    "* Download the LLaVA model: `mmproj-model-f16.gguf` and one of `ggml-model-[f16|q5_k|q4_k].gguf` from [LLaVA 7b repo](https://huggingface.co/mys/ggml_llava-v1.5-7b/tree/main)\n",
    "* Build\n",
    "```\n",
    "mkdir build && cd build && cmake ..\n",
    "cmake --build .\n",
    "```\n",
    "* Run inference\n",
    "```\n",
    "/Users/rlm/Desktop/Code/llama.cpp/bin/llava -m ../models/llava-7b/ggml-model-q5_k.gguf --mmproj ../models/llava-7b/mmproj-model-f16.gguf --temp 0.1 -p \"Describe the image in detail. Be specific about graphs, such as bar plots.\" --image \"$img\" > \"$output_file\"\n",
    "```\n",
    "\n",
    "For [better performance](https://github.com/ggerganov/llama.cpp/issues/3602):\n",
    "\n",
    "* It appears `7b` is currently better than `13b` in the [LLaVA repo](https://huggingface.co/mys/ggml_llava-v1.5-7b/tree/main)\n",
    "* Use `--temp 0.1`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "440b20e4-a74d-4c75-b538-0ca24d581713",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "# Define the directory containing the images\n",
    "IMG_DIR=~/Desktop/Papers/LLaVA/\n",
    "\n",
    "# Loop through each image in the directory\n",
    "for img in \"${IMG_DIR}\"*.jpg; do\n",
    "    # Extract the base name of the image without extension\n",
    "    base_name=$(basename \"$img\" .jpg)\n",
    "\n",
    "    # Define the output file name based on the image name\n",
    "    output_file=\"${IMG_DIR}${base_name}.txt\"\n",
    "\n",
    "    # Execute the command and save the output to the defined output file\n",
    "    /Users/rlm/Desktop/Code/llama.cpp/bin/llava -m ../models/llava-7b/ggml-model-q5_k.gguf --mmproj ../models/llava-7b/mmproj-model-f16.gguf --temp 0.1 -p \"Describe the image in detail. Be specific about graphs, such as bar plots.\" --image \"$img\" > \"$output_file\"\n",
    "\n",
    "done"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a69dcd6b-0226-4173-a80d-36921824c824",
   "metadata": {},
   "source": [
    "Fetch / clean image summaries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "54924f9e-0f81-4232-8efb-8485db1063c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os, glob\n",
    "\n",
    "# Get all .txt file summaries\n",
    "file_paths = glob.glob(os.path.expanduser(os.path.join(path, \"*.txt\")))\n",
    "\n",
    "# Read each file and store its content in a list\n",
    "img_summaries = []\n",
    "for file_path in file_paths:\n",
    "    with open(file_path, 'r') as file:\n",
    "        img_summaries.append(file.read())\n",
    "\n",
    "# Remove any logging prior to summary\n",
    "logging_header=\"clip_model_load: total allocated memory: 201.27 MB\\n\\n\"\n",
    "cleaned_img_summary = [s.split(logging_header, 1)[1].strip() for s in img_summaries]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67b030d4-2ac5-41b6-9245-fc3ba5771d87",
   "metadata": {},
   "source": [
    "### Add to vectorstore\n",
    "\n",
    "Use [Multi Vector Retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/multi_vector#summary) with summaries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "d643cc61-827d-4f3c-8242-7a7c8291ed8a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import uuid\n",
    "from langchain.vectorstores import Chroma\n",
    "from langchain.storage import InMemoryStore\n",
    "from langchain.schema.document import Document\n",
    "from langchain.embeddings import OpenAIEmbeddings\n",
    "from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
    "\n",
    "# The vectorstore to use to index the child chunks\n",
    "vectorstore = Chroma(\n",
    "    collection_name=\"summaries\",\n",
    "    embedding_function=OpenAIEmbeddings()\n",
    ")\n",
    "\n",
    "# The storage layer for the parent documents\n",
    "store = InMemoryStore()\n",
    "id_key = \"doc_id\"\n",
    "\n",
    "# The retriever (empty to start)\n",
    "retriever = MultiVectorRetriever(\n",
    "    vectorstore=vectorstore, \n",
    "    docstore=store, \n",
    "    id_key=id_key,\n",
    ")\n",
    "\n",
    "# Add texts\n",
    "doc_ids = [str(uuid.uuid4()) for _ in texts]\n",
    "summary_texts = [Document(page_content=s,metadata={id_key: doc_ids[i]}) for i, s in enumerate(text_summaries)]\n",
    "retriever.vectorstore.add_documents(summary_texts)\n",
    "retriever.docstore.mset(list(zip(doc_ids, texts)))\n",
    "\n",
    "# Add tables\n",
    "table_ids = [str(uuid.uuid4()) for _ in tables]\n",
    "summary_tables = [Document(page_content=s,metadata={id_key: table_ids[i]}) for i, s in enumerate(table_summaries)]\n",
    "retriever.vectorstore.add_documents(summary_tables)\n",
    "retriever.docstore.mset(list(zip(table_ids, tables)))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b90572a0-0377-4598-8d12-bba22a51b655",
   "metadata": {},
   "source": [
    "For `option 1` (above): \n",
    "\n",
    "* Store the image summary in the `docstore`, which we return to the LLM for answer generation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "2e0f06f3-a5bc-4342-aee6-c3495d047e66",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Add image summaries\n",
    "img_ids = [str(uuid.uuid4()) for _ in cleaned_img_summary]\n",
    "summary_img = [Document(page_content=s,metadata={id_key: img_ids[i]}) for i, s in enumerate(cleaned_img_summary)]\n",
    "retriever.vectorstore.add_documents(summary_img)\n",
    "retriever.docstore.mset(list(zip(img_ids, cleaned_img_summary))) "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d667e5c-5385-48c4-b878-51dcc03cc4d0",
   "metadata": {},
   "source": [
    "For `option 2a` (above): \n",
    "\n",
    "* Store the images in the `docstore`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8c75a7b3-04f3-41eb-97e5-61af49d92104",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Add images\n",
    "img_ids = [str(uuid.uuid4()) for _ in cleaned_img_summary]\n",
    "summary_img = [Document(page_content=s,metadata={id_key: img_ids[i]}) for i, s in enumerate(cleaned_img_summary)]\n",
    "retriever.vectorstore.add_documents(summary_img)\n",
    "### Fetch images\n",
    "retriever.docstore.mset(list(zip(img_ids, ### image ### ))) "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b45fb81-46b1-426e-aa2c-01aed4eac700",
   "metadata": {},
   "source": [
    "### Sanity Check retrieval\n",
    "\n",
    "The most complex table in the paper:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "a5f4dd59-005a-4ff8-ad51-ea2e50d79c10",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Subject Context Modality Grade Method NAT SOC LAN | TXT IMG NO | Gi6~ G7-12 | Average Representative & SoTA methods with numbers reported in the literature Human [30] 90.23 84.97 87.48 | 89.60 87.50 88.10 | 91.59 82.42 88.40 GPT-3.5 [30] 74.64 69.74 76.00 | 74.44 67.28 77.42 | 76.80 68.89 73.97 GPT-3.5 w/ CoT [30] 75.44 70.87 78.09 | 74.68 67.43 79.93 | 78.23 69.68 75.17 LLaMA-Adapter [55] 84.37 88.30 84.36 | 83.72 80.32 86.90 | 85.83 84.05 85.19 MM-CoT gase [57] 87.52 77.17 85.82 | 87.88 82.90 86.83 | 84.65 85.37 84.91 MM-CoT farge [57] 95.91 82.00 90.82 | 95.26 88.80 92.89 | 92.44 90.31 | 91.68 Results with our own experiment runs GPT-4 84.06 73.45 87.36 | 81.87 70.75 90.73 | 84.69 79.10 82.69 LLaVA 90.36 95.95 88.00 | 89.49 88.00 90.66 | 90.93 90.90 90.92 LLaVA+GPT-4 (complement) 90.36 95.50 88.55 | 89.05 87.80 91.08 | 92.22 88.73 90.97 LLaVA+GPT-4 (judge) 91.56 96.74 91.09 | 90.62 88.99 93.52 | 92.73 92.16 92.53'"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tables[2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f68ef8b-0fec-4b2f-a0d3-c440c74ebaa1",
   "metadata": {},
   "source": [
    "Here is the summary, which is embedded:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "9eb16ea9-d932-4062-9ace-e8f77dee530b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The table presents the performance of various methods in different subject contexts and modalities. The subjects are Natural Sciences (NAT), Social Sciences (SOC), and Language (LAN). The modalities are text (TXT), image (IMG), and no modality (NO). The methods include Human, GPT-3.5, GPT-3.5 with CoT, LLaMA-Adapter, MM-CoT gase, MM-CoT farge, GPT-4, LLaVA, LLaVA+GPT-4 (complement), and LLaVA+GPT-4 (judge). The performance is measured in grades from 6 to 12. The MM-CoT farge method had the highest performance in most categories, with LLaVA+GPT-4 (judge) showing the highest results in the experiment runs.'"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table_summaries[2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc2bcc4c-c05d-4417-aaf9-78acd754dde6",
   "metadata": {},
   "source": [
    "Here is our retrieval of that table from the natural language query:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "1bea75fe-85af-4955-a80c-6e0b44a8e215",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Subject Context Modality Grade Method NAT SOC LAN | TXT IMG NO | Gi6~ G7-12 | Average Representative & SoTA methods with numbers reported in the literature Human [30] 90.23 84.97 87.48 | 89.60 87.50 88.10 | 91.59 82.42 88.40 GPT-3.5 [30] 74.64 69.74 76.00 | 74.44 67.28 77.42 | 76.80 68.89 73.97 GPT-3.5 w/ CoT [30] 75.44 70.87 78.09 | 74.68 67.43 79.93 | 78.23 69.68 75.17 LLaMA-Adapter [55] 84.37 88.30 84.36 | 83.72 80.32 86.90 | 85.83 84.05 85.19 MM-CoT gase [57] 87.52 77.17 85.82 | 87.88 82.90 86.83 | 84.65 85.37 84.91 MM-CoT farge [57] 95.91 82.00 90.82 | 95.26 88.80 92.89 | 92.44 90.31 | 91.68 Results with our own experiment runs GPT-4 84.06 73.45 87.36 | 81.87 70.75 90.73 | 84.69 79.10 82.69 LLaVA 90.36 95.95 88.00 | 89.49 88.00 90.66 | 90.93 90.90 90.92 LLaVA+GPT-4 (complement) 90.36 95.50 88.55 | 89.05 87.80 91.08 | 92.22 88.73 90.97 LLaVA+GPT-4 (judge) 91.56 96.74 91.09 | 90.62 88.99 93.52 | 92.73 92.16 92.53'"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# We can retrieve this table\n",
    "retriever.get_relevant_documents(\"What are results for LLaMA across across domains / subjects?\")[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3dbb23d5-ae66-444d-8f5f-b24107fb9c57",
   "metadata": {},
   "source": [
    "Image:"
   ]
  },
  {
   "attachments": {
    "5d505f36-17e1-4fe5-a405-f01f7a392716.jpg": {
     "image/jpeg": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAE4AQUDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3qaVYYHlbO1FLHAycCuT0/wARa9f2Vrq8emWsul3W1kihnLXCxsRhiMbSRnJUHjnk4rq5zItvIYkDyBSVVjgE9hntXm8ssJhQaBpmqaV4geZS9rHFKsCtuG8vx5RTGfmHJ7c0AdzJ4j0eLUl06TUrVL0kKIDKA2T0GPU+lNfxLosd6tk+p2q3LOUERlG7cO2PWuPuSY9A1Tw8+m3cmq3NxMY3FsxjdnkLJN5mNoCgqSScjbj0p82ktJ4Y8SRSWLPJcaqXwYjmRd8eGHHIwDz7UAdfaeItHv4J57TUrWaK35mdJQQg65J7DHeqN3420K30e71OLUYLiG1HziKQFs9hz3OOPWub8ZaPeXmsTmyt5TGLGBpPKjDeYsdwGKAEbWO3OFPXp3qG5tptVtdZuLeXVL6X+y5IA89iLcMSQQgGxWdhg+wz70AdpH4gtHlMn2q0+xeTHIs3ncne5UZGMAEgAHPJyO1Rnxh4dFobo6zZeQH8sv5wwGxnH5c/Sue1qP8Ati7nnitJ5LaeHTwA8DLuAuyWBUjPA5IPatWHT0/4TzU7lrQeW+mQR+YU4Y75dy578bcj6UAat94h0jTFha+1G2txMMxmSQDcPUe3I56Ul74i0fTpIo7zUraB5RuQSSAbh6/T36V53p1re6V9nnvJ9RtI59JtYY/IsBcHKIQ0TAoxU5OcHAOfapRYNo9lFFEmq29w+nJDsnshdx3KAuRE4QfKy7scFRgjrigDvtZ1VtNt7SWNFkE93DbnJxgO4XP4Zp1/qq2GoQxzSW0du0MssjyS7WUJjJC45HPJyMcetY2qQXMvh3QkNkYpkvLJpIIwWEWHXcOOw9faqfja2lvdXt7a3TfNNpGopGo/iYiIAfnQB18t/awTQxSzokkwJjVmwWwMnH0HNY91410GDSr/AFCLUbe4jsozJIsUgJ9gOe5GB2rnNXvl8Q6lpIt9J1Ga2jguluFkt3h5aEjy8sByeRnp05qr5V3f6bqNlaJdXsY0eeCJ7uwME0DEALFuwofPsONo55oA7S916JPC82tWRS4jWEyphuGx2zU1z4h0iyv47C61K2hu3xtieQBjnp9M9vWszVZP7S8AXRtoZi0loVWNomV84xjaQDnPtWLdOtnZeINJu9Mu7u81CeZ4RHbM6XCuMJ8+Nq7RhTuIxtz6UAdbfeItH0y6S2vtStredwCqSyBTg8A+wzUUeuxrf6rHdNHBbWCxsZmbAwy7iT6YrlNOJ8PQapYa3p93f3l3sIkitmmW7XyUTZuAwCCrDDYHOe9UrfRNWs7z7ZcrLdQ6fBZm4sthIuGSLDOp/jdCAQOhI9SCAD0a51OysrE3t1cxw2wAJlkbaoB6cmse+8b6DZ6fb332+GW3nuFt1eOQcMSAc88Yzk98VH4qnc6XZTwRMY/tKO8wtTO9uu0kOI+pOcDocZzjiuRWG883U794tRuYBqOn3Pmy2mx5ERhvYRqoPAHpnA+lAHpF3qtjYQRzXl3DBHJnY8jhQcKWPJ9gT+FUW8R2dzYpdaZd2dxGbiOBmafaAWYDGcH5ueB3OPWsjxpNGJ/DE72sk8SamJDGiEtgQSnIXqSOuOvHrWTqKS6vq0+qWFpcrZPc6dEd8DoZXjuNzPtYA4VSBuI7HsKAOyPiTR11H+zm1O1F7nb5BkG7OM4x647daraN4v0jWklNtdx5S5a2ALjLMC2MY9QpI9hWBp09vaaamhXmi3dzfi8LOv2dtkjGXd5/m4246NnOeMYzVVRPaQyF7K7J0/xBJeTKsDkmFy+GTA+cYcEhckYNAHc3WtabZNKt1fQQmIKZPMkC7Q2dpPpnB/I1VfxXoMZtxJq1ohuVDRBpQNyk4B+hPFcbeB9c8Q3N0mnXRs5LjTNjTW7KJFSWQs2CM4Ge49+hFM1C0uLPVvEMN3NqarqMm6JLWwWdbiMxhQm8odpBBGGIAznuaAO7u/EOkWN7HZ3Wo28NzJjbE8gDcnA+mT0qze6jaabaNdXtzHbwLjdJK20DPTk159qsE2nrdW8CX73UttFG1pPZ/aYL8iMKAXUfI3G0ncAMZxXR6+JIjouoS2ksttZ3Be4hiQyMmY2UMFHLbSR0Gec9qAE1zxvpmn6FHqFleWdwZp0t4S0wCb2YA7iMkBc5PGeK6O1lM1rFIzIxdAxKHKnI7e1ef3sUup3c+o2Vjcx2k2padt3wMjSGOXLy7SAQMFRkgfc9MV6KMY4oAKD0NFB6GgBluSbaIk5JQfyopLb/AI9Yf9xf5UUASUYFRXEpht3kWN5CqkhExlvYZIFcVY+Mr28g8PXc1pMn28Tb7eJAxkKqCu3ngdeSR0OaAO6x7UYGK59fF9g0QxDcm7+0G1+x+X+98wLuIxnGNvzZzjHetDTNZt9UWcRpLFLbyeXNDMm142wDyOnIIIIJBzQBoYHpRisGHxbYTyRFYrkWs03kRXhT9075wADnPJGASME9DyKRfFtk08Km3vFt5rg20V00WInkyRgc55IIBIwfWgDewPSlwK5GfxuJ9Liv9M068mhe6igDvGqht0oRsZYEkcj0zj3plt4ukg1DXY7q1vJ47KdeIIQfIjMMbfNzycluBk/pQB2OB6UmB6Vz3ifXZrHwx/aGmq0rTNEsciKrbRIyjdhiAeG498dqRPFVraGe3uUvCbKJXu55I1CxAoGBYjjJHZQee1AHR4qoNMtBqZ1Hys3fl+UJGYnauckAHgZIGcdcDPSsk+L7KJHa7tL20xbPcoJ4seaiDLbcE8gc7Tg+1NXxlZtNBALG/wDOuY/NtYzCAZ0HVl5wMZGd23qPWgDo8D0oxWXaa/Y3eiy6qGeK2hEnneapVozGSHDD1BU1SbxhZQ2k1zd215axx2xugZowN8QxlhgnpkZBweelAHQ4owPSsKLxVZtcLDcQ3NqXiaeJp49olRRliuCeQMHBwcdqbB4usXYfaYLqxR7d7mN7qPYrxrgsRycYBBwcH2oA38D0pMD0rDtvFVlO8QmgurRJommhkuY9iyIoySOeDjnDYOO3Bp1l4ntLy5toDb3Vv9rUtavPHtWYAZ+XnIOOcNg4zxwaANvA9KTA9KyNQ8RQWF3JarZ3l1LFEJpRbxbhGhzgkkjJODwMnjpVT/hM7CWVo7K3u71lto7o/Z4wf3TglW5I64PHX0FAG3cWNtdy20s8Qd7aTzYTkja20rn8mI/GrGB6VgnxZYy/ZxZQ3N8Z7cXSi2jztiPRjkjGecDqcHjis+w1+5vfAur
    }
   },
   "cell_type": "markdown",
   "id": "329fd4ee-4a68-4f3b-b157-a676f13ba587",
   "metadata": {},
   "source": [
    "![figure-8-1.jpg](attachment:5d505f36-17e1-4fe5-a405-f01f7a392716.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6fde6f17-d244-4270-b759-68e1858d399f",
   "metadata": {},
   "source": [
    "We can retrieve this image summary:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "6f52ee1e-ed46-4a81-834a-3608a1cf90ce",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The image features a close-up of a tray filled with various pieces of fried chicken. The chicken pieces are arranged in a way that resembles a map of the world, with some pieces placed in the shape of continents and others as countries. The arrangement of the chicken pieces creates a visually appealing and playful representation of the world, making it an interesting and creative presentation.\\n\\nmain: image encoded in   865.20 ms by CLIP (    1.50 ms per image patch)'"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "retriever.get_relevant_documents(\"Images / figures with playful and creative examples\")[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69060724-e390-4dda-8250-5f86025c874a",
   "metadata": {},
   "source": [
    "## RAG\n",
    "\n",
    "Run [RAG pipeline](https://python.langchain.com/docs/expression_language/cookbook/retrieval).\n",
    "\n",
    "For `option 1` (above): \n",
    "\n",
    "* Simply pass retrieved text chunks to LLM, as usual.\n",
    "\n",
    "For `option 2a` (above): \n",
    "\n",
    "* We would pass retrieved image and images to the multi-modal LLM.\n",
    "* This should be possible soon, once [llama-cpp-python add multi-modal support](https://github.com/abetlen/llama-cpp-python/issues/813).\n",
    "* And, of course, this will be enabled by GPT4-V API."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "771a47fa-1267-4db8-a6ae-5fde48bbc069",
   "metadata": {},
   "outputs": [],
   "source": [
    "from operator import itemgetter\n",
    "from langchain.schema.runnable import RunnablePassthrough\n",
    "\n",
    "# Prompt template\n",
    "template = \"\"\"Answer the question based only on the following context, which can include text and tables:\n",
    "{context}\n",
    "Question: {question}\n",
    "\"\"\"\n",
    "prompt = ChatPromptTemplate.from_template(template)\n",
    "\n",
    "# Option 1: LLM\n",
    "model = ChatOpenAI(temperature=0,model=\"gpt-4\")\n",
    "# Option 2: Multi-modal LLM\n",
    "# model = GPT4-V or LLaVA\n",
    "\n",
    "# RAG pipeline\n",
    "chain = (\n",
    "    {\"context\": retriever, \"question\": RunnablePassthrough()} \n",
    "    | prompt \n",
    "    | model \n",
    "    | StrOutputParser()\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "ea8414a8-65ee-4e11-8154-029b454f46af",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The performance of LLaMA across multiple image domains/subjects is as follows: In the Natural Science (NAT) subject, it scored 84.37. In the Social Science (SOC) subject, it scored 88.30. In the Language Science (LAN) subject, it scored 84.36. In the Text Context (TXT) subject, it scored 83.72. In the Image Context (IMG) subject, it scored 80.32. In the No Context (NO) subject, it scored 86.90. For grades 1-6 (G1-6), it scored 85.83 and for grades 7-12 (G7-12), it scored 84.05. The average score was 85.19.'"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.invoke(\"What is the performance of LLaVa across across multiple image domains / subjects?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ce57b80-fbd0-47f3-817f-6549a0409f51",
   "metadata": {},
   "source": [
    "We can check the [trace](https://smith.langchain.com/public/85a7180e-0dd1-44d9-996f-6cb9c6f53205/r) to see retrieval of tables and text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "e88f0bc7-81fb-4883-a021-58734a74411b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The text provides an example of a playful and creative image. The image features a close-up of a tray filled with various pieces of fried chicken. The chicken pieces are arranged in a way that resembles a map of the world, with some pieces placed in the shape of continents and others as countries. The arrangement of the chicken pieces creates a visually appealing and playful representation of the world, making it an interesting and creative presentation.'"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.invoke(\"Explain images / figures with playful and creative examples.\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}