langchain/cookbook/Semi_structured_and_multi_m...

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "812a4dbc-fe04-4b84-bdf9-390045e30806",
   "metadata": {},
   "source": [
    "# Use Case\n",
    "\n",
    "Many documents contain a mixture of content types: \n",
    "\n",
    "* `Semi-structured data`: text and tables\n",
    "* `Image`: images contain valuable information \n",
    "\n",
    "Here, we show how `Unstructured` can be used to partition all 3 types from documents. \n",
    "\n",
    "We generate summaries of each content type, using an open source multi-modal model (LLaVA) for image summaries.\n",
    "\n",
    "We embed summaries and store them w/ the raw table and text chunks. \n",
    "\n",
    "For images, we embed and store the summaries only; future work could store the images for final answer generation."
   ]
  },
  {
   "attachments": {
    "28d8b949-8001-4b94-be3f-a64995390935.png": {
     "image/png": "iVBORw0KGgoAAAANSUhEUgAABlwAAAHcCAYAAACznQd3AAAMQGlDQ1BJQ0MgUHJvZmlsZQAASImVVwdYU8kWnluSkJAQIICAlNCbIFIDSAmhBZBeBBshCRBKjIGgYkcXFVy7iIANXRVR7IDYETuLYu+LBRVlXSzYlTcpoOu+8r35vrnz33/O/OfMuTP33gGAfpwnkeSimgDkiQukcaGBzNEpqUzSU0AEdEAFVkCLx8+XsGNiIgEsA+3fy7vrAJG3VxzlWv/s/69FSyDM5wOAxECcLsjn50G8HwC8mi+RFgBAlPMWkwskcgwr0JHCACFeIMeZSlwtx+lKvFthkxDHgbgVADUqjyfNBEDjEuSZhfxMqKHRC7GzWCASA0BnQuyXlzdRAHEaxLbQRgKxXJ+V/oNO5t800wc1ebzMQayci6KoBYnyJbm8qf9nOv53ycuVDfiwhpWaJQ2Lk88Z5u1mzsQIOaZC3CNOj4qGWBviDyKBwh5ilJIlC0tU2qNG/HwOzBnQg9hZwAuKgNgI4hBxblSkik/PEIVwIYYrBJ0iKuAmQKwP8QJhfnC8ymaDdGKcyhfakCHlsFX8WZ5U4Vfu674sJ5Gt0n+dJeSq9DGNoqyEZIgpEFsWipKiINaA2Ck/Jz5CZTOyKIsTNWAjlcXJ47eEOE4oDg1U6mOFGdKQOJV9aV7+wHyxDVkibpQK7y3ISghT5gdr5fMU8cO5YJeEYnbigI4wf3TkwFwEwqBg5dyxZ0JxYrxK54OkIDBOORanSHJjVPa4uTA3VM6bQ+yWXxivGosnFcAFqdTHMyQFMQnKOPGibF54jDIefCmIBBwQBJhABms6mAiygai9p7EH3il7QgAPSEEmEAJHFTMwIlnRI4bXeFAE/oRICPIHxwUqeoWgEPJfB1nl1RFkKHoLFSNywBOI80AEyIX3MsUo8aC3JPAYMqJ/eOfByofx5sIq7//3/AD7nWFDJlLFyAY8MukDlsRgYhAxjBhCtMMNcT/cB4+E1wBYXXAW7jUwj+/2hCeEDsJDwjVCJ+HWBFGx9KcoR4FOqB+iykX6j7nAraGmOx6I+0J1qIzr4YbAEXeDfti4P/TsDlmOKm55Vpg/af9tBj88DZUd2ZmMkoeQA8i2P4/UsNdwH1SR5/rH/ChjTR/MN2ew52f/nB+yL4BtxM+W2AJsH3YGO4Gdww5jjYCJHcOasDbsiBwPrq7HitU14C1OEU8O1BH9w9/Ak5VnMt+5zrnb+Yuyr0A4Rf6OBpyJkqlSUWZWAZMNvwhCJlfMdxrGdHF2cQVA/n1Rvr7exCq+G4he23du7h8A+B7r7+8/9J0LPwbAHk+4/Q9+52xZ8NOhDsDZg3yZtFDJ4fILAb4l6HCnGQATYAFs4XxcgAfwAQEgGISDaJAAUsB4GH0WXOdSMBlMB3NACSgDS8EqUAnWg01gG9gJ9oJGcBicAKfBBXAJXAN34OrpAi9AL3gHPiMIQkJoCAMxQEwRK8QBcUFYiB8SjEQicUgKkoZkImJEhkxH5iJlyHKkEtmI1CJ7kIPICeQc0oHcQh4g3chr5BOKoVRUBzVGrdHhKAtloxFoAjoOzUQnoUXoPHQxWoHWoDvQBvQEegG9hnaiL9A+DGDqmB5mhjliLIyDRWOpWAYmxWZipVg5VoPVY83wOV/BOrEe7CNOxBk4E3eEKzgMT8T5+CR8Jr4Ir8S34Q14K34Ff4D34t8INIIRwYHgTeASRhMyCZMJJYRywhbCAcIpuJe6CO+IRKIe0YboCfdiCjGbOI24iLiWuIt4nNhBfETsI5FIBiQHki8pmsQjFZBKSGtIO0jHSJdJXaQPaupqpmouaiFqqWpitWK1crXtakfVLqs9VftM1iRbkb3J0WQBeSp5CXkzuZl8kdxF/kzRothQfCkJlGzKHEoFpZ5yinKX8kZdXd1c3Us9Vl2kPlu9Qn23+ln1B+ofqdpUeyqHOpYqoy6mbqUep96ivqHRaNa0AFoqrYC2mFZLO0m7T/ugwdBw0uBqCDRmaVRpNGhc1nhJJ9Ot6Gz6eHoRvZy+j36R3qNJ1rTW5GjyNGdqVmke1Lyh2afF0BqhFa2Vp7VIa7vWOa1n2iRta+1gbYH2PO1N2ie1HzEwhgWDw+Az5jI2M04xunSIOjY6XJ1snTKdnTrtOr262rpuukm6U3SrdI/oduphetZ6XL1cvSV6e/Wu630aYjyEPUQ4ZOGQ+iGXh7zXH6ofoC/UL9XfpX9N/5MB0yDYIMdgmUGjwT1D3NDeMNZwsuE6w1OGPUN1hvoM5Q8tHbp36G0j1MjeKM5omtEmozajPmMT41BjifEa45PGPSZ6JgEm2SYrTY6adJsyTP1MRaYrTY+ZPmfqMtnMXGYFs5XZa2ZkFmYmM9to1m722dzGPNG82HyX+T0LigXLIsNipUWLRa+lqeUoy+mWdZa3rchWLKssq9VWZ6zeW9tYJ1vPt260fmajb8O1KbKps7lrS7P1t51kW2N71Y5ox7LLsVtrd8ketXe3z7Kvsr/ogDp4OIgc1jp0DCMM8xomHlYz7IYj1ZHtWOhY5/jASc8p0qnYqdHp5XDL4anDlw0/M/ybs7tzrvNm5zsjtEeEjyge0TzitYu9C9+lyuWqK801xHWWa5PrKzcHN6HbOreb7gz3Ue7z3Vvcv3p4ekg96j26PS090zyrPW+wdFgxrEWss14Er0CvWV6HvT56e3gXeO/1/svH0SfHZ7vPs5E2I4UjN4985Gvuy/Pd6Nvpx/RL89vg1+lv5s/zr/F/GGARIAjYEvCUbcfOZu9gvwx0DpQGHgh8z/HmzOAcD8KCQoNKg9qDtYMTgyuD74eYh2SG1IX0hrqHTgs9HkYIiwhbFnaDa8zlc2u5veGe4TPCWyOoEfERlREPI+0jpZHNo9BR4aNWjLobZRUljmqMBtHc6BXR92JsYibFHIolxsbEVsU+iRsRNz3uTDwjfkL89vh3CYEJSxLuJNomyhJbkuhJY5Nqk94nByUvT+4cPXz0jNEXUgxTRClNqaTUpNQtqX1jgsesGtM11n1sydjr42zGTRl3brzh+NzxRybQJ/Am7EsjpCWnbU/7wovm1fD60rnp1em9fA5/Nf+FIECwUtAt9BUuFz7N8M1YnvEs0zdzRWZ3ln9WeVaPiCOqFL3KDsten/0+Jzpna05/bnLurjy1vLS8g2JtcY64daLJxCkTOyQOkhJJ5yTvSasm9UojpFvykfxx+U0FOvBHvk1mK/tF9qDQr7Cq8MPkpMn7pmhNEU9pm2o/deHUp0UhRb9Nw6fxp7VMN5s+Z/qDGewZG2ciM9NntsyymDVvVtfs0Nnb5lDm5Mz5vdi5eHnx27nJc5vnGc+bPe/RL6G/1JVolEhLbsz3mb9+Ab5AtKB9oevCNQu/lQpKz5c5l5WXfVnEX3T+1xG/VvzavzhjcfsSjyXrlhKXipdeX+a/bNtyreVFyx+tGLWiYSVzZenKt6smrDpX7la+fjVltWx1Z0VkRdMayzVL13ypzKq8VhVYtavaqHph9fu1grWX1wWsq19vvL5s/acNog03N4ZubKixrinfRNxUuOnJ5qTNZ35j/Va7xXBL2ZavW8VbO7fFbWut9ayt3W60fUkdWier694xdselnUE7m+od6zfu0ttVthvslu1+vidtz/W9EXtb9rH21e+32l99gHGgtAFpmNrQ25jV2NmU0tRxMPxgS7NP84FDToe2HjY7XHVE98iSo5Sj8472Hys61ndccrznROaJRy0TWu6cHH3yamtsa/upiFNnT4ecPnmGfebYWd+zh895nzt4nnW+8YLHhYY297YDv7v/fqDdo73houfFpktel5o7RnYcvex/+cSVoCunr3KvXrgWda3jeuL1mzfG3ui8Kbj57FburVe3C29/vjP7LuFu6T3Ne+X3je7X/GH3x65Oj84jD4IetD2Mf3jnEf/Ri8f5j790zXtCe1L+1PRp7TOXZ4e7Q7ovPR/zvOuF5MXnnpI/tf6sfmn7cv9fAX+19Y7u7XolfdX/etEbgzdb37q9bemL6bv/Lu/d5/elHww+bPvI+njmU/Knp58nfyF9qfhq97X5
    }
   },
   "cell_type": "markdown",
   "id": "7ea70aa4-9a53-4e17-8f11-1d14a0ac9b43",
   "metadata": {},
   "source": [
    "![img_flow.png](attachment:28d8b949-8001-4b94-be3f-a64995390935.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "74b56bde-1ba0-4525-a11d-cab02c5659e4",
   "metadata": {},
   "source": [
    "## Data Loading\n",
    "\n",
    "### Partition PDF tables, text, and images\n",
    "  \n",
    "* `LLaVA` Paper: https://arxiv.org/pdf/2304.08485.pdf\n",
    "* Use `Unstructured` to partition elements"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "61cbb874-ecc0-4d5d-9954-f0a41f65e0d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "path = \"/Users/rlm/Desktop/Papers/LLaVA/\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "4aa9055d-1243-4b5a-aca0-2c6f8fb34143",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from lxml import html\n",
    "from pydantic import BaseModel\n",
    "from typing import Any, Optional\n",
    "from unstructured.partition.pdf import partition_pdf\n",
    "\n",
    "# Get elements\n",
    "raw_pdf_elements = partition_pdf(filename=path+\"LLaVA.pdf\",\n",
    "                                 # Using pdf format to find embedded image blocks\n",
    "                                 extract_images_in_pdf=True,\n",
    "                                 # Use layout model (YOLO-X) to get bounding boxes (for tables) and find titles\n",
    "                                 # Titles are any sub-section of the document \n",
    "                                 infer_table_structure=True, \n",
    "                                 # Post processing to aggregate text once we have the title \n",
    "                                 chunking_strategy=\"by_title\",\n",
    "                                 # Chunking params to aggregate text blocks\n",
    "                                 # Attempt to create a new chunk 3800 chars\n",
    "                                 # Attempt to keep chunks > 2000 chars \n",
    "                                 # Hard max on chunks\n",
    "                                 max_characters=4000, \n",
    "                                 new_after_n_chars=3800, \n",
    "                                 combine_text_under_n_chars=2000,\n",
    "                                 image_output_dir_path=path)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "7cdba921-5419-4471-b234-d93af3859b6f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{\"<class 'unstructured.documents.elements.CompositeElement'>\": 31,\n",
       " \"<class 'unstructured.documents.elements.Table'>\": 3}"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a dictionary to store counts of each type\n",
    "category_counts = {}\n",
    "\n",
    "for element in raw_pdf_elements:\n",
    "    category = str(type(element))\n",
    "    if category in category_counts:\n",
    "        category_counts[category] += 1\n",
    "    else:\n",
    "        category_counts[category] = 1\n",
    "\n",
    "# Unique_categories will have unique elements\n",
    "unique_categories = set(category_counts.keys())\n",
    "category_counts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "5f660305-e165-4b6c-ada3-a67a422defb5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3\n",
      "31\n"
     ]
    }
   ],
   "source": [
    "class Element(BaseModel):\n",
    "    type: str\n",
    "    text: Any\n",
    "\n",
    "# Categorize by type\n",
    "categorized_elements = []\n",
    "for element in raw_pdf_elements:\n",
    "    if \"unstructured.documents.elements.Table\" in str(type(element)):\n",
    "        categorized_elements.append(Element(type=\"table\", text=str(element)))\n",
    "    elif \"unstructured.documents.elements.CompositeElement\" in str(type(element)):\n",
    "        categorized_elements.append(Element(type=\"text\", text=str(element)))\n",
    "\n",
    "# Tables\n",
    "table_elements = [e for e in categorized_elements if e.type == \"table\"]\n",
    "print(len(table_elements))\n",
    "\n",
    "# Text\n",
    "text_elements = [e for e in categorized_elements if e.type == \"text\"]\n",
    "print(len(text_elements))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0aa7f52f-bf5c-4ba4-af72-b2ccba59a4cf",
   "metadata": {},
   "source": [
    "## Multi-vector retriever\n",
    "\n",
    "Use [multi-vector-retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/multi_vector#summary).\n",
    "\n",
    "### Text and Table summaries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "523e6ed2-2132-4748-bdb7-db765f20648d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.chat_models import ChatOpenAI\n",
    "from langchain.prompts import ChatPromptTemplate\n",
    "from langchain.schema.output_parser import StrOutputParser"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "22c22e3f-42fb-4a4a-a87a-89f10ba8ab99",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prompt \n",
    "prompt_text=\"\"\"You are an assistant tasked with summarizing tables and text. \\ \n",
    "Give a concise summary of the table or text. Table or text chunk: {element} \"\"\"\n",
    "prompt = ChatPromptTemplate.from_template(prompt_text) \n",
    "\n",
    "# Summary chain \n",
    "model = ChatOpenAI(temperature=0,model=\"gpt-4\")\n",
    "summarize_chain = {\"element\": lambda x:x} | prompt | model | StrOutputParser()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f176b374-aef0-48f4-a104-fb26b1dd6922",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Apply to text\n",
    "texts = [i.text for i in text_elements]\n",
    "text_summaries = summarize_chain.batch(texts, {\"max_concurrency\": 5})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e79fb8a4-c46f-442a-adad-1de4419a06f2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Apply to tables\n",
    "tables = [i.text for i in table_elements]\n",
    "table_summaries = summarize_chain.batch(tables, {\"max_concurrency\": 5})"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d52641eb-762e-4460-80c7-3ac3ddd93621",
   "metadata": {},
   "source": [
    "### Image summaries \n",
    "\n",
    "Use [llama.cpp](https://github.com/ggerganov/llama.cpp/pull/3436): \n",
    "\n",
    "* Download `mmproj-model-f16.gguf` and one of `ggml-model-[f16|q5_k|q4_k].gguf` from [LLaVA 7b repo](https://huggingface.co/mys/ggml_llava-v1.5-7b/tree/main)\n",
    "* Clone `llama.cpp` repo\n",
    "* Build\n",
    "```\n",
    "mkdir build && cd build && cmake ..\n",
    "cmake --build .\n",
    "```\n",
    "\n",
    "For [better performance](https://github.com/ggerganov/llama.cpp/issues/3602):\n",
    "\n",
    "* It appears `7b` is currently better than `13b` in the above LLaVA repo.\n",
    "* Use `--temp 0.1`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "35a45671-94bd-4bf9-a193-bd123ba5fc15",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "# Define the directory containing the images\n",
    "IMG_DIR=~/Desktop/Papers/LLaVA/\n",
    "\n",
    "# Loop through each image in the directory\n",
    "for img in \"${IMG_DIR}\"*.jpg; do\n",
    "    # Extract the base name of the image without extension\n",
    "    base_name=$(basename \"$img\" .jpg)\n",
    "\n",
    "    # Define the output file name based on the image name\n",
    "    output_file=\"${IMG_DIR}${base_name}.txt\"\n",
    "\n",
    "    # Execute the command and save the output to the defined output file\n",
    "    /Users/rlm/Desktop/Code/llama.cpp/bin/llava -m ../models/llava-7b/ggml-model-q5_k.gguf --mmproj ../models/llava-7b/mmproj-model-f16.gguf --temp 0.1 -p \"Describe the image in detail. Be specific about graphs, such as bar plots.\" --image \"$img\" > \"$output_file\"\n",
    "\n",
    "done"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "da8a8c94-3df7-446f-9a69-703295f50f02",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os, glob\n",
    "\n",
    "# Get all .txt file summaries\n",
    "file_paths = glob.glob(os.path.expanduser(os.path.join(path, \"*.txt\")))\n",
    "\n",
    "# Read each file and store its content in a list\n",
    "img_summaries = []\n",
    "for file_path in file_paths:\n",
    "    with open(file_path, 'r') as file:\n",
    "        img_summaries.append(file.read())\n",
    "\n",
    "# Remove any logging prior to summary\n",
    "logging_header=\"clip_model_load: total allocated memory: 201.27 MB\\n\\n\"\n",
    "cleaned_img_summary = [s.split(logging_header, 1)[1].strip() for s in img_summaries]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67b030d4-2ac5-41b6-9245-fc3ba5771d87",
   "metadata": {},
   "source": [
    "### Add to vectorstore\n",
    "\n",
    "Use [Multi Vector Retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/multi_vector#summary) with summaries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "d643cc61-827d-4f3c-8242-7a7c8291ed8a",
   "metadata": {},
   "outputs": [],
   "source": [
    "import uuid\n",
    "from langchain.vectorstores import Chroma\n",
    "from langchain.storage import InMemoryStore\n",
    "from langchain.schema.document import Document\n",
    "from langchain.embeddings import OpenAIEmbeddings\n",
    "from langchain.retrievers.multi_vector import MultiVectorRetriever\n",
    "\n",
    "# The vectorstore to use to index the child chunks\n",
    "vectorstore = Chroma(\n",
    "    collection_name=\"summaries\",\n",
    "    embedding_function=OpenAIEmbeddings()\n",
    ")\n",
    "\n",
    "# The storage layer for the parent documents\n",
    "store = InMemoryStore()\n",
    "id_key = \"doc_id\"\n",
    "\n",
    "# The retriever (empty to start)\n",
    "retriever = MultiVectorRetriever(\n",
    "    vectorstore=vectorstore, \n",
    "    docstore=store, \n",
    "    id_key=id_key,\n",
    ")\n",
    "\n",
    "# Add texts\n",
    "doc_ids = [str(uuid.uuid4()) for _ in texts]\n",
    "summary_texts = [Document(page_content=s,metadata={id_key: doc_ids[i]}) for i, s in enumerate(text_summaries)]\n",
    "retriever.vectorstore.add_documents(summary_texts)\n",
    "retriever.docstore.mset(list(zip(doc_ids, texts)))\n",
    "\n",
    "# Add tables\n",
    "table_ids = [str(uuid.uuid4()) for _ in tables]\n",
    "summary_tables = [Document(page_content=s,metadata={id_key: table_ids[i]}) for i, s in enumerate(table_summaries)]\n",
    "retriever.vectorstore.add_documents(summary_tables)\n",
    "retriever.docstore.mset(list(zip(table_ids, tables)))\n",
    "\n",
    "# Add images\n",
    "img_ids = [str(uuid.uuid4()) for _ in cleaned_img_summary]\n",
    "summary_img = [Document(page_content=s,metadata={id_key: img_ids[i]}) for i, s in enumerate(cleaned_img_summary)]\n",
    "retriever.vectorstore.add_documents(summary_img)\n",
    "# Store the image summary as the raw document\n",
    "# It may able be possible to store raw images and pass those to a final multi-modal LLM\n",
    "retriever.docstore.mset(list(zip(img_ids, cleaned_img_summary))) "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4b45fb81-46b1-426e-aa2c-01aed4eac700",
   "metadata": {},
   "source": [
    "### Sanity Check retrieval\n",
    "\n",
    "The most complex table in the paper:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "a5f4dd59-005a-4ff8-ad51-ea2e50d79c10",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Subject Context Modality Grade Method NAT SOC LAN | TXT IMG NO | Gi6~ G7-12 | Average Representative & SoTA methods with numbers reported in the literature Human [30] 90.23 84.97 87.48 | 89.60 87.50 88.10 | 91.59 82.42 88.40 GPT-3.5 [30] 74.64 69.74 76.00 | 74.44 67.28 77.42 | 76.80 68.89 73.97 GPT-3.5 w/ CoT [30] 75.44 70.87 78.09 | 74.68 67.43 79.93 | 78.23 69.68 75.17 LLaMA-Adapter [55] 84.37 88.30 84.36 | 83.72 80.32 86.90 | 85.83 84.05 85.19 MM-CoT gase [57] 87.52 77.17 85.82 | 87.88 82.90 86.83 | 84.65 85.37 84.91 MM-CoT farge [57] 95.91 82.00 90.82 | 95.26 88.80 92.89 | 92.44 90.31 | 91.68 Results with our own experiment runs GPT-4 84.06 73.45 87.36 | 81.87 70.75 90.73 | 84.69 79.10 82.69 LLaVA 90.36 95.95 88.00 | 89.49 88.00 90.66 | 90.93 90.90 90.92 LLaVA+GPT-4 (complement) 90.36 95.50 88.55 | 89.05 87.80 91.08 | 92.22 88.73 90.97 LLaVA+GPT-4 (judge) 91.56 96.74 91.09 | 90.62 88.99 93.52 | 92.73 92.16 92.53'"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tables[2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f68ef8b-0fec-4b2f-a0d3-c440c74ebaa1",
   "metadata": {},
   "source": [
    "Here is the summary, which is embedded:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "9eb16ea9-d932-4062-9ace-e8f77dee530b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The table presents the performance of various methods in different subject contexts and modalities. The subjects are Natural Sciences (NAT), Social Sciences (SOC), and Language (LAN). The modalities are text (TXT), image (IMG), and no modality (NO). The methods include Human, GPT-3.5, GPT-3.5 with CoT, LLaMA-Adapter, MM-CoT gase, MM-CoT farge, GPT-4, LLaVA, LLaVA+GPT-4 (complement), and LLaVA+GPT-4 (judge). The performance is measured in grades from 6 to 12. The MM-CoT farge method had the highest performance in most categories, with LLaVA+GPT-4 (judge) showing the highest results in the experiment runs.'"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table_summaries[2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc2bcc4c-c05d-4417-aaf9-78acd754dde6",
   "metadata": {},
   "source": [
    "Here is our retrieval of that table from the natural langugae query:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "1bea75fe-85af-4955-a80c-6e0b44a8e215",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Subject Context Modality Grade Method NAT SOC LAN | TXT IMG NO | Gi6~ G7-12 | Average Representative & SoTA methods with numbers reported in the literature Human [30] 90.23 84.97 87.48 | 89.60 87.50 88.10 | 91.59 82.42 88.40 GPT-3.5 [30] 74.64 69.74 76.00 | 74.44 67.28 77.42 | 76.80 68.89 73.97 GPT-3.5 w/ CoT [30] 75.44 70.87 78.09 | 74.68 67.43 79.93 | 78.23 69.68 75.17 LLaMA-Adapter [55] 84.37 88.30 84.36 | 83.72 80.32 86.90 | 85.83 84.05 85.19 MM-CoT gase [57] 87.52 77.17 85.82 | 87.88 82.90 86.83 | 84.65 85.37 84.91 MM-CoT farge [57] 95.91 82.00 90.82 | 95.26 88.80 92.89 | 92.44 90.31 | 91.68 Results with our own experiment runs GPT-4 84.06 73.45 87.36 | 81.87 70.75 90.73 | 84.69 79.10 82.69 LLaVA 90.36 95.95 88.00 | 89.49 88.00 90.66 | 90.93 90.90 90.92 LLaVA+GPT-4 (complement) 90.36 95.50 88.55 | 89.05 87.80 91.08 | 92.22 88.73 90.97 LLaVA+GPT-4 (judge) 91.56 96.74 91.09 | 90.62 88.99 93.52 | 92.73 92.16 92.53'"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# We can retrive this table\n",
    "retriever.get_relevant_documents(\"What are results for LLaMA across across domains / subjects?\")[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3dbb23d5-ae66-444d-8f5f-b24107fb9c57",
   "metadata": {},
   "source": [
    "Image:"
   ]
  },
  {
   "attachments": {
    "5d505f36-17e1-4fe5-a405-f01f7a392716.jpg": {
     "image/jpeg": "/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAgGBgcGBQgHBwcJCQgKDBQNDAsLDBkSEw8UHRofHh0aHBwgJC4nICIsIxwcKDcpLDAxNDQ0Hyc5PTgyPC4zNDL/2wBDAQkJCQwLDBgNDRgyIRwhMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjIyMjL/wAARCAE4AQUDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD3qaVYYHlbO1FLHAycCuT0/wARa9f2Vrq8emWsul3W1kihnLXCxsRhiMbSRnJUHjnk4rq5zItvIYkDyBSVVjgE9hntXm8ssJhQaBpmqaV4geZS9rHFKsCtuG8vx5RTGfmHJ7c0AdzJ4j0eLUl06TUrVL0kKIDKA2T0GPU+lNfxLosd6tk+p2q3LOUERlG7cO2PWuPuSY9A1Tw8+m3cmq3NxMY3FsxjdnkLJN5mNoCgqSScjbj0p82ktJ4Y8SRSWLPJcaqXwYjmRd8eGHHIwDz7UAdfaeItHv4J57TUrWaK35mdJQQg65J7DHeqN3420K30e71OLUYLiG1HziKQFs9hz3OOPWub8ZaPeXmsTmyt5TGLGBpPKjDeYsdwGKAEbWO3OFPXp3qG5tptVtdZuLeXVL6X+y5IA89iLcMSQQgGxWdhg+wz70AdpH4gtHlMn2q0+xeTHIs3ncne5UZGMAEgAHPJyO1Rnxh4dFobo6zZeQH8sv5wwGxnH5c/Sue1qP8Ati7nnitJ5LaeHTwA8DLuAuyWBUjPA5IPatWHT0/4TzU7lrQeW+mQR+YU4Y75dy578bcj6UAat94h0jTFha+1G2txMMxmSQDcPUe3I56Ul74i0fTpIo7zUraB5RuQSSAbh6/T36V53p1re6V9nnvJ9RtI59JtYY/IsBcHKIQ0TAoxU5OcHAOfapRYNo9lFFEmq29w+nJDsnshdx3KAuRE4QfKy7scFRgjrigDvtZ1VtNt7SWNFkE93DbnJxgO4XP4Zp1/qq2GoQxzSW0du0MssjyS7WUJjJC45HPJyMcetY2qQXMvh3QkNkYpkvLJpIIwWEWHXcOOw9faqfja2lvdXt7a3TfNNpGopGo/iYiIAfnQB18t/awTQxSzokkwJjVmwWwMnH0HNY91410GDSr/AFCLUbe4jsozJIsUgJ9gOe5GB2rnNXvl8Q6lpIt9J1Ga2jguluFkt3h5aEjy8sByeRnp05qr5V3f6bqNlaJdXsY0eeCJ7uwME0DEALFuwofPsONo55oA7S916JPC82tWRS4jWEyphuGx2zU1z4h0iyv47C61K2hu3xtieQBjnp9M9vWszVZP7S8AXRtoZi0loVWNomV84xjaQDnPtWLdOtnZeINJu9Mu7u81CeZ4RHbM6XCuMJ8+Nq7RhTuIxtz6UAdbfeItH0y6S2vtStredwCqSyBTg8A+wzUUeuxrf6rHdNHBbWCxsZmbAwy7iT6YrlNOJ8PQapYa3p93f3l3sIkitmmW7XyUTZuAwCCrDDYHOe9UrfRNWs7z7ZcrLdQ6fBZm4sthIuGSLDOp/jdCAQOhI9SCAD0a51OysrE3t1cxw2wAJlkbaoB6cmse+8b6DZ6fb332+GW3nuFt1eOQcMSAc88Yzk98VH4qnc6XZTwRMY/tKO8wtTO9uu0kOI+pOcDocZzjiuRWG883U794tRuYBqOn3Pmy2mx5ERhvYRqoPAHpnA+lAHpF3qtjYQRzXl3DBHJnY8jhQcKWPJ9gT+FUW8R2dzYpdaZd2dxGbiOBmafaAWYDGcH5ueB3OPWsjxpNGJ/DE72sk8SamJDGiEtgQSnIXqSOuOvHrWTqKS6vq0+qWFpcrZPc6dEd8DoZXjuNzPtYA4VSBuI7HsKAOyPiTR11H+zm1O1F7nb5BkG7OM4x647daraN4v0jWklNtdx5S5a2ALjLMC2MY9QpI9hWBp09vaaamhXmi3dzfi8LOv2dtkjGXd5/m4246NnOeMYzVVRPaQyF7K7J0/xBJeTKsDkmFy+GTA+cYcEhckYNAHc3WtabZNKt1fQQmIKZPMkC7Q2dpPpnB/I1VfxXoMZtxJq1ohuVDRBpQNyk4B+hPFcbeB9c8Q3N0mnXRs5LjTNjTW7KJFSWQs2CM4Ge49+hFM1C0uLPVvEMN3NqarqMm6JLWwWdbiMxhQm8odpBBGGIAznuaAO7u/EOkWN7HZ3Wo28NzJjbE8gDcnA+mT0qze6jaabaNdXtzHbwLjdJK20DPTk159qsE2nrdW8CX73UttFG1pPZ/aYL8iMKAXUfI3G0ncAMZxXR6+JIjouoS2ksttZ3Be4hiQyMmY2UMFHLbSR0Gec9qAE1zxvpmn6FHqFleWdwZp0t4S0wCb2YA7iMkBc5PGeK6O1lM1rFIzIxdAxKHKnI7e1ef3sUup3c+o2Vjcx2k2padt3wMjSGOXLy7SAQMFRkgfc9MV6KMY4oAKD0NFB6GgBluSbaIk5JQfyopLb/AI9Yf9xf5UUASUYFRXEpht3kWN5CqkhExlvYZIFcVY+Mr28g8PXc1pMn28Tb7eJAxkKqCu3ngdeSR0OaAO6x7UYGK59fF9g0QxDcm7+0G1+x+X+98wLuIxnGNvzZzjHetDTNZt9UWcRpLFLbyeXNDMm142wDyOnIIIIJBzQBoYHpRisGHxbYTyRFYrkWs03kRXhT9075wADnPJGASME9DyKRfFtk08Km3vFt5rg20V00WInkyRgc55IIBIwfWgDewPSlwK5GfxuJ9Liv9M068mhe6igDvGqht0oRsZYEkcj0zj3plt4ukg1DXY7q1vJ47KdeIIQfIjMMbfNzycluBk/pQB2OB6UmB6Vz3ifXZrHwx/aGmq0rTNEsciKrbRIyjdhiAeG498dqRPFVraGe3uUvCbKJXu55I1CxAoGBYjjJHZQee1AHR4qoNMtBqZ1Hys3fl+UJGYnauckAHgZIGcdcDPSsk+L7KJHa7tL20xbPcoJ4seaiDLbcE8gc7Tg+1NXxlZtNBALG/wDOuY/NtYzCAZ0HVl5wMZGd23qPWgDo8D0oxWXaa/Y3eiy6qGeK2hEnneapVozGSHDD1BU1SbxhZQ2k1zd215axx2xugZowN8QxlhgnpkZBweelAHQ4owPSsKLxVZtcLDcQ3NqXiaeJp49olRRliuCeQMHBwcdqbB4usXYfaYLqxR7d7mN7qPYrxrgsRycYBBwcH2oA38D0pMD0rDtvFVlO8QmgurRJommhkuY9iyIoySOeDjnDYOO3Bp1l4ntLy5toDb3Vv9rUtavPHtWYAZ+XnIOOcNg4zxwaANvA9KTA9KyNQ8RQWF3JarZ3l1LFEJpRbxbhGhzgkkjJODwMnjpVT/hM7CWVo7K3u71lto7o/Z4wf3TglW5I64PHX0FAG3cWNtdy20s8Qd7aTzYTkja20rn8mI/GrGB6VgnxZYy/ZxZQ3N8Z7cXSi2jztiPRjkjGecDqcHjis+w1+5vfAur
    }
   },
   "cell_type": "markdown",
   "id": "329fd4ee-4a68-4f3b-b157-a676f13ba587",
   "metadata": {},
   "source": [
    "![figure-8-1.jpg](attachment:5d505f36-17e1-4fe5-a405-f01f7a392716.jpg)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6fde6f17-d244-4270-b759-68e1858d399f",
   "metadata": {},
   "source": [
    "We can retrieve this image summary:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "6f52ee1e-ed46-4a81-834a-3608a1cf90ce",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The image features a close-up of a tray filled with various pieces of fried chicken. The chicken pieces are arranged in a way that resembles a map of the world, with some pieces placed in the shape of continents and others as countries. The arrangement of the chicken pieces creates a visually appealing and playful representation of the world, making it an interesting and creative presentation.\\n\\nmain: image encoded in   865.20 ms by CLIP (    1.50 ms per image patch)'"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "retriever.get_relevant_documents(\"Images / figures with playful and creative examples\")[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69060724-e390-4dda-8250-5f86025c874a",
   "metadata": {},
   "source": [
    "## RAG\n",
    "\n",
    "Run [RAG pipeline](https://python.langchain.com/docs/expression_language/cookbook/retrieval)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "771a47fa-1267-4db8-a6ae-5fde48bbc069",
   "metadata": {},
   "outputs": [],
   "source": [
    "from operator import itemgetter\n",
    "from langchain.schema.runnable import RunnablePassthrough\n",
    "\n",
    "# Prompt template\n",
    "template = \"\"\"Answer the question based only on the following context, which can include text and tables:\n",
    "{context}\n",
    "Question: {question}\n",
    "\"\"\"\n",
    "prompt = ChatPromptTemplate.from_template(template)\n",
    "\n",
    "# LLM\n",
    "model = ChatOpenAI(temperature=0,model=\"gpt-4\")\n",
    "\n",
    "# RAG pipeline\n",
    "chain = (\n",
    "    {\"context\": retriever, \"question\": RunnablePassthrough()} \n",
    "    | prompt \n",
    "    | model \n",
    "    | StrOutputParser()\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "ea8414a8-65ee-4e11-8154-029b454f46af",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The performance of LLaMA across multiple image domains/subjects is as follows: In the Natural Science (NAT) subject, it scored 84.37. In the Social Science (SOC) subject, it scored 88.30. In the Language Science (LAN) subject, it scored 84.36. In the Text Context (TXT) subject, it scored 83.72. In the Image Context (IMG) subject, it scored 80.32. In the No Context (NO) subject, it scored 86.90. For grades 1-6 (G1-6), it scored 85.83 and for grades 7-12 (G7-12), it scored 84.05. The average score was 85.19.'"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.invoke(\"What is the performance of LLaVa across across mutiple image domains / subjects?\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ce57b80-fbd0-47f3-817f-6549a0409f51",
   "metadata": {},
   "source": [
    "We can check the [trace](https://smith.langchain.com/public/85a7180e-0dd1-44d9-996f-6cb9c6f53205/r) to see retrieval of tables and text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "id": "e88f0bc7-81fb-4883-a021-58734a74411b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'The text provides an example of a playful and creative image. The image features a close-up of a tray filled with various pieces of fried chicken. The chicken pieces are arranged in a way that resembles a map of the world, with some pieces placed in the shape of continents and others as countries. The arrangement of the chicken pieces creates a visually appealing and playful representation of the world, making it an interesting and creative presentation.'"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chain.invoke(\"Explain images / figures with playful and creative examples.\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}