langchain/cookbook/nomic_multimodal_rag.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "9fc3897d-176f-4729-8fd1-cfb4add53abd",
   "metadata": {},
   "source": [
    "## Nomic multi-modal RAG\n",
    "\n",
    "Many documents contain a mixture of content types, including text and images. \n",
    "\n",
    "Yet, information captured in images is lost in most RAG applications.\n",
    "\n",
    "With the emergence of multimodal LLMs, like [GPT-4V](https://openai.com/research/gpt-4v-system-card), it is worth considering how to utilize images in RAG:\n",
    "\n",
    "In this demo we\n",
    "\n",
    "* Use multimodal embeddings from Nomic Embed [Vision](https://huggingface.co/nomic-ai/nomic-embed-vision-v1.5) and [Text](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) to embed images and text\n",
    "* Retrieve both using similarity search\n",
    "* Pass raw images and text chunks to a multimodal LLM for answer synthesis \n",
    "\n",
    "## Signup\n",
    "\n",
    "Get your API token, then run:\n",
    "```\n",
    "! nomic login\n",
    "```\n",
    "\n",
    "Then run with your generated API token \n",
    "```\n",
    "! nomic login < token > \n",
    "```\n",
    "\n",
    "## Packages\n",
    "\n",
    "For `unstructured`, you will also need `poppler` ([installation instructions](https://pdf2image.readthedocs.io/en/latest/installation.html)) and `tesseract` ([installation instructions](https://tesseract-ocr.github.io/tessdoc/Installation.html)) in your system."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "54926b9b-75c2-4cd4-8f14-b3882a0d370b",
   "metadata": {},
   "outputs": [],
   "source": [
    "! nomic login token"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "febbc459-ebba-4c1a-a52b-fed7731593f8",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "! pip install -U langchain-nomic langchain_community tiktoken langchain-openai chromadb langchain # (newest versions required for multi-modal)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "acbdc603-39e2-4a5f-836c-2bbaecd46b0b",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# lock to 0.10.19 due to a persistent bug in more recent versions\n",
    "! pip install \"unstructured[all-docs]==0.10.19\" pillow pydantic lxml pillow matplotlib tiktoken"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e94b3fb-8e3e-4736-be0a-ad881626c7bd",
   "metadata": {},
   "source": [
    "## Data Loading\n",
    "\n",
    "### Partition PDF text and images\n",
    "  \n",
    "Let's look at an example pdfs containing interesting images.\n",
    "\n",
    "1/ Art from the J Paul Getty museum:\n",
    "\n",
    " * Here is a [zip file](https://drive.google.com/file/d/18kRKbq2dqAhhJ3DfZRnYcTBEUfYxe1YR/view?usp=sharing) with the PDF and the already extracted images. \n",
    "* https://www.getty.edu/publications/resources/virtuallibrary/0892360224.pdf\n",
    "\n",
    "2/ Famous photographs from library of congress:\n",
    "\n",
    "* https://www.loc.gov/lcm/pdf/LCM_2020_1112.pdf\n",
    "* We'll use this as an example below\n",
    "\n",
    "We can use `partition_pdf` below from [Unstructured](https://unstructured-io.github.io/unstructured/introduction.html#key-concepts) to extract text and images.\n",
    "\n",
    "To supply this to extract the images:\n",
    "```\n",
    "extract_images_in_pdf=True\n",
    "```\n",
    "\n",
    "\n",
    "\n",
    "If using this zip file, then you can simply process the text only with:\n",
    "```\n",
    "extract_images_in_pdf=False\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9646b524-71a7-4b2a-bdc8-0b81f77e968f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Folder with pdf and extracted images\n",
    "from pathlib import Path\n",
    "\n",
    "# replace with actual path to images\n",
    "path = Path(\"../art\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "77f096ab-a933-41d0-8f4e-1efc83998fc3",
   "metadata": {},
   "outputs": [],
   "source": [
    "path.resolve()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bc4839c0-8773-4a07-ba59-5364501269b2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Extract images, tables, and chunk text\n",
    "from unstructured.partition.pdf import partition_pdf\n",
    "\n",
    "raw_pdf_elements = partition_pdf(\n",
    "    filename=str(path.resolve()) + \"/getty.pdf\",\n",
    "    extract_images_in_pdf=False,\n",
    "    infer_table_structure=True,\n",
    "    chunking_strategy=\"by_title\",\n",
    "    max_characters=4000,\n",
    "    new_after_n_chars=3800,\n",
    "    combine_text_under_n_chars=2000,\n",
    "    image_output_dir_path=path,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "969545ad",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Categorize text elements by type\n",
    "tables = []\n",
    "texts = []\n",
    "for element in raw_pdf_elements:\n",
    "    if \"unstructured.documents.elements.Table\" in str(type(element)):\n",
    "        tables.append(str(element))\n",
    "    elif \"unstructured.documents.elements.CompositeElement\" in str(type(element)):\n",
    "        texts.append(str(element))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d8e6349-1547-4cbf-9c6f-491d8610ec10",
   "metadata": {},
   "source": [
    "## Multi-modal embeddings with our document\n",
    "\n",
    "We will use [nomic-embed-vision-v1.5](https://huggingface.co/nomic-ai/nomic-embed-vision-v1.5) embeddings. This model is aligned \n",
    "to [nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) allowing for multimodal semantic search and Multimodal RAG!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4bc15842-cb95-4f84-9eb5-656b0282a800",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import uuid\n",
    "\n",
    "import chromadb\n",
    "import numpy as np\n",
    "from langchain_community.vectorstores import Chroma\n",
    "from langchain_nomic import NomicEmbeddings\n",
    "from PIL import Image as _PILImage\n",
    "\n",
    "# Create chroma\n",
    "text_vectorstore = Chroma(\n",
    "    collection_name=\"mm_rag_clip_photos_text\",\n",
    "    embedding_function=NomicEmbeddings(\n",
    "        vision_model=\"nomic-embed-vision-v1.5\", model=\"nomic-embed-text-v1.5\"\n",
    "    ),\n",
    ")\n",
    "image_vectorstore = Chroma(\n",
    "    collection_name=\"mm_rag_clip_photos_image\",\n",
    "    embedding_function=NomicEmbeddings(\n",
    "        vision_model=\"nomic-embed-vision-v1.5\", model=\"nomic-embed-text-v1.5\"\n",
    "    ),\n",
    ")\n",
    "\n",
    "# Get image URIs with .jpg extension only\n",
    "image_uris = sorted(\n",
    "    [\n",
    "        os.path.join(path, image_name)\n",
    "        for image_name in os.listdir(path)\n",
    "        if image_name.endswith(\".jpg\")\n",
    "    ]\n",
    ")\n",
    "\n",
    "# Add images\n",
    "image_vectorstore.add_images(uris=image_uris)\n",
    "\n",
    "# Add documents\n",
    "text_vectorstore.add_texts(texts=texts)\n",
    "\n",
    "# Make retriever\n",
    "image_retriever = image_vectorstore.as_retriever()\n",
    "text_retriever = text_vectorstore.as_retriever()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "02a186d0-27e0-4820-8092-63b5349dd25d",
   "metadata": {},
   "source": [
    "## RAG\n",
    "\n",
    "`vectorstore.add_images` will store / retrieve images as base64 encoded strings.\n",
    "\n",
    "These can be passed to [GPT-4V](https://platform.openai.com/docs/guides/vision)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "344f56a8-0dc3-433e-851c-3f7600c7a72b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import base64\n",
    "import io\n",
    "from io import BytesIO\n",
    "\n",
    "import numpy as np\n",
    "from PIL import Image\n",
    "\n",
    "\n",
    "def resize_base64_image(base64_string, size=(128, 128)):\n",
    "    \"\"\"\n",
    "    Resize an image encoded as a Base64 string.\n",
    "\n",
    "    Args:\n",
    "    base64_string (str): Base64 string of the original image.\n",
    "    size (tuple): Desired size of the image as (width, height).\n",
    "\n",
    "    Returns:\n",
    "    str: Base64 string of the resized image.\n",
    "    \"\"\"\n",
    "    # Decode the Base64 string\n",
    "    img_data = base64.b64decode(base64_string)\n",
    "    img = Image.open(io.BytesIO(img_data))\n",
    "\n",
    "    # Resize the image\n",
    "    resized_img = img.resize(size, Image.LANCZOS)\n",
    "\n",
    "    # Save the resized image to a bytes buffer\n",
    "    buffered = io.BytesIO()\n",
    "    resized_img.save(buffered, format=img.format)\n",
    "\n",
    "    # Encode the resized image to Base64\n",
    "    return base64.b64encode(buffered.getvalue()).decode(\"utf-8\")\n",
    "\n",
    "\n",
    "def is_base64(s):\n",
    "    \"\"\"Check if a string is Base64 encoded\"\"\"\n",
    "    try:\n",
    "        return base64.b64encode(base64.b64decode(s)) == s.encode()\n",
    "    except Exception:\n",
    "        return False\n",
    "\n",
    "\n",
    "def split_image_text_types(docs):\n",
    "    \"\"\"Split numpy array images and texts\"\"\"\n",
    "    images = []\n",
    "    text = []\n",
    "    for doc in docs:\n",
    "        doc = doc.page_content  # Extract Document contents\n",
    "        if is_base64(doc):\n",
    "            # Resize image to avoid OAI server error\n",
    "            images.append(\n",
    "                resize_base64_image(doc, size=(250, 250))\n",
    "            )  # base64 encoded str\n",
    "        else:\n",
    "            text.append(doc)\n",
    "    return {\"images\": images, \"texts\": text}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23a2c1d8-fea6-4152-b184-3172dd46c735",
   "metadata": {},
   "source": [
    "Currently, we format the inputs using a `RunnableLambda` while we add image support to `ChatPromptTemplates`.\n",
    "\n",
    "Our runnable follows the classic RAG flow - \n",
    "\n",
    "* We first compute the context (both \"texts\" and \"images\" in this case) and the question (just a RunnablePassthrough here) \n",
    "* Then we pass this into our prompt template, which is a custom function that formats the message for the gpt-4-vision-preview model. \n",
    "* And finally we parse the output as a string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5d8919dc-c238-4746-86ba-45d940a7d260",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "os.environ[\"OPENAI_API_KEY\"] = \"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4c93fab3-74c4-4f1d-958a-0bc4cdd0797e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from operator import itemgetter\n",
    "\n",
    "from langchain_core.messages import HumanMessage, SystemMessage\n",
    "from langchain_core.output_parsers import StrOutputParser\n",
    "from langchain_core.runnables import RunnableLambda, RunnablePassthrough\n",
    "from langchain_openai import ChatOpenAI\n",
    "\n",
    "\n",
    "def prompt_func(data_dict):\n",
    "    # Joining the context texts into a single string\n",
    "    formatted_texts = \"\\n\".join(data_dict[\"text_context\"][\"texts\"])\n",
    "    messages = []\n",
    "\n",
    "    # Adding image(s) to the messages if present\n",
    "    if data_dict[\"image_context\"][\"images\"]:\n",
    "        image_message = {\n",
    "            \"type\": \"image_url\",\n",
    "            \"image_url\": {\n",
    "                \"url\": f\"data:image/jpeg;base64,{data_dict['image_context']['images'][0]}\"\n",
    "            },\n",
    "        }\n",
    "        messages.append(image_message)\n",
    "\n",
    "    # Adding the text message for analysis\n",
    "    text_message = {\n",
    "        \"type\": \"text\",\n",
    "        \"text\": (\n",
    "            \"As an expert art critic and historian, your task is to analyze and interpret images, \"\n",
    "            \"considering their historical and cultural significance. Alongside the images, you will be \"\n",
    "            \"provided with related text to offer context. Both will be retrieved from a vectorstore based \"\n",
    "            \"on user-input keywords. Please use your extensive knowledge and analytical skills to provide a \"\n",
    "            \"comprehensive summary that includes:\\n\"\n",
    "            \"- A detailed description of the visual elements in the image.\\n\"\n",
    "            \"- The historical and cultural context of the image.\\n\"\n",
    "            \"- An interpretation of the image's symbolism and meaning.\\n\"\n",
    "            \"- Connections between the image and the related text.\\n\\n\"\n",
    "            f\"User-provided keywords: {data_dict['question']}\\n\\n\"\n",
    "            \"Text and / or tables:\\n\"\n",
    "            f\"{formatted_texts}\"\n",
    "        ),\n",
    "    }\n",
    "    messages.append(text_message)\n",
    "\n",
    "    return [HumanMessage(content=messages)]\n",
    "\n",
    "\n",
    "model = ChatOpenAI(temperature=0, model=\"gpt-4-vision-preview\", max_tokens=1024)\n",
    "\n",
    "# RAG pipeline\n",
    "chain = (\n",
    "    {\n",
    "        \"text_context\": text_retriever | RunnableLambda(split_image_text_types),\n",
    "        \"image_context\": image_retriever | RunnableLambda(split_image_text_types),\n",
    "        \"question\": RunnablePassthrough(),\n",
    "    }\n",
    "    | RunnableLambda(prompt_func)\n",
    "    | model\n",
    "    | StrOutputParser()\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1566096d-97c2-4ddc-ba4a-6ef88c525e4e",
   "metadata": {},
   "source": [
    "## Test retrieval and run RAG"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "90121e56-674b-473b-871d-6e4753fd0c45",
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import HTML, display\n",
    "\n",
    "\n",
    "def plt_img_base64(img_base64):\n",
    "    # Create an HTML img tag with the base64 string as the source\n",
    "    image_html = f'<img src=\"data:image/jpeg;base64,{img_base64}\" />'\n",
    "\n",
    "    # Display the image by rendering the HTML\n",
    "    display(HTML(image_html))\n",
    "\n",
    "\n",
    "docs = text_retriever.invoke(\"Women with children\", k=5)\n",
    "for doc in docs:\n",
    "    if is_base64(doc.page_content):\n",
    "        plt_img_base64(doc.page_content)\n",
    "    else:\n",
    "        print(doc.page_content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "44eaa532-f035-4c04-b578-02339d42554c",
   "metadata": {},
   "outputs": [],
   "source": [
    "docs = image_retriever.invoke(\"Women with children\", k=5)\n",
    "for doc in docs:\n",
    "    if is_base64(doc.page_content):\n",
    "        plt_img_base64(doc.page_content)\n",
    "    else:\n",
    "        print(doc.page_content)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "69fb15fd-76fc-49b4-806d-c4db2990027d",
   "metadata": {},
   "outputs": [],
   "source": [
    "chain.invoke(\"Women with children\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "227f08b8-e732-4089-b65c-6eb6f9e48f15",
   "metadata": {},
   "source": [
    "We can see the images retrieved in the LangSmith trace:\n",
    "\n",
    "LangSmith [trace](https://smith.langchain.com/public/69c558a5-49dc-4c60-a49b-3adbb70f74c5/r/e872c2c8-528c-468f-aefd-8b5cd730a673)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
embeddings: nomic embed vision (#22482) Thank you for contributing to LangChain! Description: Adds Langchain support for Nomic Embed Vision Twitter handle: nomic_ai,zach_nussbaum - [x] Add tests and docs: If you're adding a new integration, please include 1. a test for the integration, preferably unit tests that do not rely on network access, 2. an example notebook showing its use. It lives in `docs/docs/integrations` directory. - [ ] Lint and test: Run `make format`, `make lint` and `make test` from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/ Additional guidelines: - Make sure optional dependencies are imported within a function. - Please do not add dependencies to pyproject.toml files (even optional ones) unless they are required for unit tests. - Most PRs should not touch more than one package. - Changes should be backwards compatible. - If you are adding something to community, do not re-import it in langchain. If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, ccurme, vbarda, hwchase17. --------- Co-authored-by: Lance Martin <122662504+rlancemartin@users.noreply.github.com> Co-authored-by: Bagatur <baskaryan@gmail.com> 2024-06-05 16:47:17 +00:00			`{`
			`"cells": [`
			`{`
			`"attachments": {},`
			`"cell_type": "markdown",`
			`"id": "9fc3897d-176f-4729-8fd1-cfb4add53abd",`
			`"metadata": {},`
			`"source": [`
			`"## Nomic multi-modal RAG\n",`
			`"\n",`
			`"Many documents contain a mixture of content types, including text and images. \n",`
			`"\n",`
			`"Yet, information captured in images is lost in most RAG applications.\n",`
			`"\n",`
			`"With the emergence of multimodal LLMs, like [GPT-4V](https://openai.com/research/gpt-4v-system-card), it is worth considering how to utilize images in RAG:\n",`
			`"\n",`
			`"In this demo we\n",`
			`"\n",`
			`"* Use multimodal embeddings from Nomic Embed [Vision](https://huggingface.co/nomic-ai/nomic-embed-vision-v1.5) and [Text](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) to embed images and text\n",`
			`"* Retrieve both using similarity search\n",`
			`"* Pass raw images and text chunks to a multimodal LLM for answer synthesis \n",`
			`"\n",`
			`"## Signup\n",`
			`"\n",`
			`"Get your API token, then run:\n",`
			"```\n",
			`"! nomic login\n",`
			"```\n",
			`"\n",`
			`"Then run with your generated API token \n",`
			"```\n",
			`"! nomic login < token > \n",`
			"```\n",
			`"\n",`
			`"## Packages\n",`
			`"\n",`
			"For `unstructured`, you will also need `poppler` ([installation instructions](https://pdf2image.readthedocs.io/en/latest/installation.html)) and `tesseract` ([installation instructions](https://tesseract-ocr.github.io/tessdoc/Installation.html)) in your system."
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "54926b9b-75c2-4cd4-8f14-b3882a0d370b",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"! nomic login token"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "febbc459-ebba-4c1a-a52b-fed7731593f8",`
			`"metadata": {`
			`"scrolled": true`
			`},`
			`"outputs": [],`
			`"source": [`
			`"! pip install -U langchain-nomic langchain_community tiktoken langchain-openai chromadb langchain # (newest versions required for multi-modal)"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "acbdc603-39e2-4a5f-836c-2bbaecd46b0b",`
			`"metadata": {`
			`"scrolled": true`
			`},`
			`"outputs": [],`
			`"source": [`
			`"# lock to 0.10.19 due to a persistent bug in more recent versions\n",`
			`"! pip install \"unstructured[all-docs]==0.10.19\" pillow pydantic lxml pillow matplotlib tiktoken"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "1e94b3fb-8e3e-4736-be0a-ad881626c7bd",`
			`"metadata": {},`
			`"source": [`
			`"## Data Loading\n",`
			`"\n",`
			`"### Partition PDF text and images\n",`
			`" \n",`
			`"Let's look at an example pdfs containing interesting images.\n",`
			`"\n",`
			`"1/ Art from the J Paul Getty museum:\n",`
			`"\n",`
			`" * Here is a [zip file](https://drive.google.com/file/d/18kRKbq2dqAhhJ3DfZRnYcTBEUfYxe1YR/view?usp=sharing) with the PDF and the already extracted images. \n",`
			`"* https://www.getty.edu/publications/resources/virtuallibrary/0892360224.pdf\n",`
			`"\n",`
			`"2/ Famous photographs from library of congress:\n",`
			`"\n",`
			`"* https://www.loc.gov/lcm/pdf/LCM_2020_1112.pdf\n",`
			`"* We'll use this as an example below\n",`
			`"\n",`
			"We can use `partition_pdf` below from [Unstructured](https://unstructured-io.github.io/unstructured/introduction.html#key-concepts) to extract text and images.\n",
			`"\n",`
			`"To supply this to extract the images:\n",`
			"```\n",
			`"extract_images_in_pdf=True\n",`
			"```\n",
			`"\n",`
			`"\n",`
			`"\n",`
			`"If using this zip file, then you can simply process the text only with:\n",`
			"```\n",
			`"extract_images_in_pdf=False\n",`
			"```"
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "9646b524-71a7-4b2a-bdc8-0b81f77e968f",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"# Folder with pdf and extracted images\n",`
			`"from pathlib import Path\n",`
			`"\n",`
			`"# replace with actual path to images\n",`
			`"path = Path(\"../art\")"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "77f096ab-a933-41d0-8f4e-1efc83998fc3",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"path.resolve()"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "bc4839c0-8773-4a07-ba59-5364501269b2",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"# Extract images, tables, and chunk text\n",`
			`"from unstructured.partition.pdf import partition_pdf\n",`
			`"\n",`
			`"raw_pdf_elements = partition_pdf(\n",`
			`" filename=str(path.resolve()) + \"/getty.pdf\",\n",`
			`" extract_images_in_pdf=False,\n",`
			`" infer_table_structure=True,\n",`
			`" chunking_strategy=\"by_title\",\n",`
			`" max_characters=4000,\n",`
			`" new_after_n_chars=3800,\n",`
			`" combine_text_under_n_chars=2000,\n",`
			`" image_output_dir_path=path,\n",`
			`")"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "969545ad",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"# Categorize text elements by type\n",`
			`"tables = []\n",`
			`"texts = []\n",`
			`"for element in raw_pdf_elements:\n",`
			`" if \"unstructured.documents.elements.Table\" in str(type(element)):\n",`
			`" tables.append(str(element))\n",`
			`" elif \"unstructured.documents.elements.CompositeElement\" in str(type(element)):\n",`
			`" texts.append(str(element))"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "5d8e6349-1547-4cbf-9c6f-491d8610ec10",`
			`"metadata": {},`
			`"source": [`
			`"## Multi-modal embeddings with our document\n",`
			`"\n",`
			`"We will use [nomic-embed-vision-v1.5](https://huggingface.co/nomic-ai/nomic-embed-vision-v1.5) embeddings. This model is aligned \n",`
			`"to [nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) allowing for multimodal semantic search and Multimodal RAG!"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "4bc15842-cb95-4f84-9eb5-656b0282a800",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"import os\n",`
			`"import uuid\n",`
			`"\n",`
			`"import chromadb\n",`
			`"import numpy as np\n",`
			`"from langchain_community.vectorstores import Chroma\n",`
			`"from langchain_nomic import NomicEmbeddings\n",`
			`"from PIL import Image as _PILImage\n",`
			`"\n",`
			`"# Create chroma\n",`
			`"text_vectorstore = Chroma(\n",`
			`" collection_name=\"mm_rag_clip_photos_text\",\n",`
			`" embedding_function=NomicEmbeddings(\n",`
			`" vision_model=\"nomic-embed-vision-v1.5\", model=\"nomic-embed-text-v1.5\"\n",`
			`" ),\n",`
			`")\n",`
			`"image_vectorstore = Chroma(\n",`
			`" collection_name=\"mm_rag_clip_photos_image\",\n",`
			`" embedding_function=NomicEmbeddings(\n",`
			`" vision_model=\"nomic-embed-vision-v1.5\", model=\"nomic-embed-text-v1.5\"\n",`
			`" ),\n",`
			`")\n",`
			`"\n",`
			`"# Get image URIs with .jpg extension only\n",`
			`"image_uris = sorted(\n",`
			`" [\n",`
			`" os.path.join(path, image_name)\n",`
			`" for image_name in os.listdir(path)\n",`
			`" if image_name.endswith(\".jpg\")\n",`
			`" ]\n",`
			`")\n",`
			`"\n",`
			`"# Add images\n",`
			`"image_vectorstore.add_images(uris=image_uris)\n",`
			`"\n",`
			`"# Add documents\n",`
			`"text_vectorstore.add_texts(texts=texts)\n",`
			`"\n",`
			`"# Make retriever\n",`
			`"image_retriever = image_vectorstore.as_retriever()\n",`
			`"text_retriever = text_vectorstore.as_retriever()"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "02a186d0-27e0-4820-8092-63b5349dd25d",`
			`"metadata": {},`
			`"source": [`
			`"## RAG\n",`
			`"\n",`
			"`vectorstore.add_images` will store / retrieve images as base64 encoded strings.\n",
			`"\n",`
			`"These can be passed to [GPT-4V](https://platform.openai.com/docs/guides/vision)."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "344f56a8-0dc3-433e-851c-3f7600c7a72b",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"import base64\n",`
			`"import io\n",`
			`"from io import BytesIO\n",`
			`"\n",`
			`"import numpy as np\n",`
			`"from PIL import Image\n",`
			`"\n",`
			`"\n",`
			`"def resize_base64_image(base64_string, size=(128, 128)):\n",`
			`" \"\"\"\n",`
			`" Resize an image encoded as a Base64 string.\n",`
			`"\n",`
			`" Args:\n",`
			`" base64_string (str): Base64 string of the original image.\n",`
			`" size (tuple): Desired size of the image as (width, height).\n",`
			`"\n",`
			`" Returns:\n",`
			`" str: Base64 string of the resized image.\n",`
			`" \"\"\"\n",`
			`" # Decode the Base64 string\n",`
			`" img_data = base64.b64decode(base64_string)\n",`
			`" img = Image.open(io.BytesIO(img_data))\n",`
			`"\n",`
			`" # Resize the image\n",`
			`" resized_img = img.resize(size, Image.LANCZOS)\n",`
			`"\n",`
			`" # Save the resized image to a bytes buffer\n",`
			`" buffered = io.BytesIO()\n",`
			`" resized_img.save(buffered, format=img.format)\n",`
			`"\n",`
			`" # Encode the resized image to Base64\n",`
			`" return base64.b64encode(buffered.getvalue()).decode(\"utf-8\")\n",`
			`"\n",`
			`"\n",`
			`"def is_base64(s):\n",`
			`" \"\"\"Check if a string is Base64 encoded\"\"\"\n",`
			`" try:\n",`
			`" return base64.b64encode(base64.b64decode(s)) == s.encode()\n",`
			`" except Exception:\n",`
			`" return False\n",`
			`"\n",`
			`"\n",`
			`"def split_image_text_types(docs):\n",`
			`" \"\"\"Split numpy array images and texts\"\"\"\n",`
			`" images = []\n",`
			`" text = []\n",`
			`" for doc in docs:\n",`
			`" doc = doc.page_content # Extract Document contents\n",`
			`" if is_base64(doc):\n",`
			`" # Resize image to avoid OAI server error\n",`
			`" images.append(\n",`
			`" resize_base64_image(doc, size=(250, 250))\n",`
			`" ) # base64 encoded str\n",`
			`" else:\n",`
			`" text.append(doc)\n",`
			`" return {\"images\": images, \"texts\": text}"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "23a2c1d8-fea6-4152-b184-3172dd46c735",`
			`"metadata": {},`
			`"source": [`
			"Currently, we format the inputs using a `RunnableLambda` while we add image support to `ChatPromptTemplates`.\n",
			`"\n",`
			`"Our runnable follows the classic RAG flow - \n",`
			`"\n",`
			`"* We first compute the context (both \"texts\" and \"images\" in this case) and the question (just a RunnablePassthrough here) \n",`
			`"* Then we pass this into our prompt template, which is a custom function that formats the message for the gpt-4-vision-preview model. \n",`
			`"* And finally we parse the output as a string."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "5d8919dc-c238-4746-86ba-45d940a7d260",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"import os\n",`
			`"\n",`
			`"os.environ[\"OPENAI_API_KEY\"] = \"\""`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "4c93fab3-74c4-4f1d-958a-0bc4cdd0797e",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"from operator import itemgetter\n",`
			`"\n",`
			`"from langchain_core.messages import HumanMessage, SystemMessage\n",`
			`"from langchain_core.output_parsers import StrOutputParser\n",`
			`"from langchain_core.runnables import RunnableLambda, RunnablePassthrough\n",`
			`"from langchain_openai import ChatOpenAI\n",`
			`"\n",`
			`"\n",`
			`"def prompt_func(data_dict):\n",`
			`" # Joining the context texts into a single string\n",`
			`" formatted_texts = \"\\n\".join(data_dict[\"text_context\"][\"texts\"])\n",`
			`" messages = []\n",`
			`"\n",`
			`" # Adding image(s) to the messages if present\n",`
			`" if data_dict[\"image_context\"][\"images\"]:\n",`
			`" image_message = {\n",`
			`" \"type\": \"image_url\",\n",`
			`" \"image_url\": {\n",`
			`" \"url\": f\"data:image/jpeg;base64,{data_dict['image_context']['images'][0]}\"\n",`
			`" },\n",`
			`" }\n",`
			`" messages.append(image_message)\n",`
			`"\n",`
			`" # Adding the text message for analysis\n",`
			`" text_message = {\n",`
			`" \"type\": \"text\",\n",`
			`" \"text\": (\n",`
			`" \"As an expert art critic and historian, your task is to analyze and interpret images, \"\n",`
			`" \"considering their historical and cultural significance. Alongside the images, you will be \"\n",`
			`" \"provided with related text to offer context. Both will be retrieved from a vectorstore based \"\n",`
			`" \"on user-input keywords. Please use your extensive knowledge and analytical skills to provide a \"\n",`
			`" \"comprehensive summary that includes:\\n\"\n",`
			`" \"- A detailed description of the visual elements in the image.\\n\"\n",`
			`" \"- The historical and cultural context of the image.\\n\"\n",`
			`" \"- An interpretation of the image's symbolism and meaning.\\n\"\n",`
			`" \"- Connections between the image and the related text.\\n\\n\"\n",`
			`" f\"User-provided keywords: {data_dict['question']}\\n\\n\"\n",`
			`" \"Text and / or tables:\\n\"\n",`
			`" f\"{formatted_texts}\"\n",`
			`" ),\n",`
			`" }\n",`
			`" messages.append(text_message)\n",`
			`"\n",`
			`" return [HumanMessage(content=messages)]\n",`
			`"\n",`
			`"\n",`
			`"model = ChatOpenAI(temperature=0, model=\"gpt-4-vision-preview\", max_tokens=1024)\n",`
			`"\n",`
			`"# RAG pipeline\n",`
			`"chain = (\n",`
			`" {\n",`
			`" \"text_context\": text_retriever \| RunnableLambda(split_image_text_types),\n",`
			`" \"image_context\": image_retriever \| RunnableLambda(split_image_text_types),\n",`
			`" \"question\": RunnablePassthrough(),\n",`
			`" }\n",`
			`" \| RunnableLambda(prompt_func)\n",`
			`" \| model\n",`
			`" \| StrOutputParser()\n",`
			`")"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "1566096d-97c2-4ddc-ba4a-6ef88c525e4e",`
			`"metadata": {},`
			`"source": [`
			`"## Test retrieval and run RAG"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "90121e56-674b-473b-871d-6e4753fd0c45",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"from IPython.display import HTML, display\n",`
			`"\n",`
			`"\n",`
			`"def plt_img_base64(img_base64):\n",`
			`" # Create an HTML img tag with the base64 string as the source\n",`
			`" image_html = f'<img src=\"data:image/jpeg;base64,{img_base64}\" />'\n",`
			`"\n",`
			`" # Display the image by rendering the HTML\n",`
			`" display(HTML(image_html))\n",`
			`"\n",`
			`"\n",`
			`"docs = text_retriever.invoke(\"Women with children\", k=5)\n",`
			`"for doc in docs:\n",`
			`" if is_base64(doc.page_content):\n",`
			`" plt_img_base64(doc.page_content)\n",`
			`" else:\n",`
			`" print(doc.page_content)"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "44eaa532-f035-4c04-b578-02339d42554c",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"docs = image_retriever.invoke(\"Women with children\", k=5)\n",`
			`"for doc in docs:\n",`
			`" if is_base64(doc.page_content):\n",`
			`" plt_img_base64(doc.page_content)\n",`
			`" else:\n",`
			`" print(doc.page_content)"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "69fb15fd-76fc-49b4-806d-c4db2990027d",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"chain.invoke(\"Women with children\")"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "227f08b8-e732-4089-b65c-6eb6f9e48f15",`
			`"metadata": {},`
			`"source": [`
			`"We can see the images retrieved in the LangSmith trace:\n",`
			`"\n",`
			`"LangSmith [trace](https://smith.langchain.com/public/69c558a5-49dc-4c60-a49b-3adbb70f74c5/r/e872c2c8-528c-468f-aefd-8b5cd730a673)."`
			`]`
			`}`
			`],`
			`"metadata": {`
			`"kernelspec": {`
			`"display_name": "Python 3 (ipykernel)",`
			`"language": "python",`
			`"name": "python3"`
			`},`
			`"language_info": {`
			`"codemirror_mode": {`
			`"name": "ipython",`
			`"version": 3`
			`},`
			`"file_extension": ".py",`
			`"mimetype": "text/x-python",`
			`"name": "python",`
			`"nbconvert_exporter": "python",`
			`"pygments_lexer": "ipython3",`
			`"version": "3.11.9"`
			`}`
			`},`
			`"nbformat": 4,`
			`"nbformat_minor": 5`
			`}`