openai-cookbook/examples/How_to_combine_GPT4o_with_RAG_Outfit_Assistant.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How to Combine GPT-4o Mini with RAG - Create a Clothing Matchmaker App\n",
    "\n",
    "Welcome to the Clothing Matchmaker App Jupyter Notebook! This project demonstrates the power of the GPT-4o mini model in analyzing images of clothing items and extracting key features such as color, style, and type. The core of our app relies on this advanced image analysis model developed by OpenAI, which enables us to accurately identify the characteristics of the input clothing item.\n",
    "\n",
    "GPT-4o mini is a small model that combines natural language processing with image recognition, allowing it to understand and generate responses based on both text and visual inputs with low latency.\n",
    "\n",
    "Building on the capabilities of the GPT-4o mini model, we employ a custom matching algorithm and the RAG technique to search our knowledge base for items that complement the identified features. This algorithm takes into account factors like color compatibility and style coherence to provide users with suitable recommendations. Through this notebook, we aim to showcase the practical application of these technologies in creating a clothing recommendation system.\n",
    "\n",
    "Using the combination of GPT-4o mini + RAG (Retrieval-Augmented Generation) offers several advantages:\n",
    "\n",
    "1. **Contextual Understanding**: GPT-4o mini can analyze input images and understand the context, such as the objects, scenes, and activities depicted. This allows for more accurate and relevant suggestions or information across various domains, whether it's interior design, cooking, or education.\n",
    "2. **Rich Knowledge Base**: RAG combines the generative capabilities of GPT-4 with a retrieval component that accesses a large corpus of information across different fields. This means the system can provide suggestions or insights based on a wide range of knowledge, from historical facts to scientific concepts.\n",
    "3. **Customization**: The approach allows for easy customization to cater to specific user needs or preferences in various applications. Whether it's tailoring suggestions to a user's taste in art or providing educational content based on a student's learning level, the system can be adapted to deliver personalized experiences.\n",
    "\n",
    "Overall, the GPT-4o mini + RAG approach offers a fast, powerful, and flexible solution for various fashion-related applications, leveraging the strengths of both generative and retrieval-based AI techniques.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Environment Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First we will install the necessary dependencies, then import the libraries and write some utility functions that we will use later on. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install openai --quiet\n",
    "%pip install tenacity --quiet\n",
    "%pip install tqdm --quiet\n",
    "%pip install numpy --quiet\n",
    "%pip install typing --quiet\n",
    "%pip install tiktoken --quiet\n",
    "%pip install concurrent --quiet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import json\n",
    "import ast\n",
    "import tiktoken\n",
    "import concurrent\n",
    "from openai import OpenAI\n",
    "from tqdm import tqdm\n",
    "from tenacity import retry, wait_random_exponential, stop_after_attempt\n",
    "from IPython.display import Image, display, HTML\n",
    "from typing import List\n",
    "\n",
    "client = OpenAI()\n",
    "\n",
    "GPT_MODEL = \"gpt-4o-mini\"\n",
    "EMBEDDING_MODEL = \"text-embedding-3-large\"\n",
    "EMBEDDING_COST_PER_1K_TOKENS = 0.00013"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating the Embeddings\n",
    "We will now set up the knowledge base by choosing a database and generating embeddings for it. I am using the `sample_styles.csv` file for this in the data folder. This is a sample of a bigger dataset that contains `~44K` items. This step can also be replaced by using an out-of-the-box vector database. For example, you can follow one of [these cookbooks](https://github.com/openai/openai-cookbook/tree/main/examples/vector_databases) to set up your vector database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "styles_filepath = \"data/sample_clothes/sample_styles.csv\"\n",
    "styles_df = pd.read_csv(styles_filepath, on_bad_lines='skip')\n",
    "print(styles_df.head())\n",
    "print(\"Opened dataset successfully. Dataset has {} items of clothing.\".format(len(styles_df)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we will generate embeddings for the entire dataset. We can parallelize the execution of these embeddings to ensure that the script scales up for larger datasets. With this logic, the time to create embeddings for the full `44K` entry dataset decreases from ~4h to ~2-3min. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Batch Embedding Logic\n",
    "\n",
    "# Simple function to take in a list of text objects and return them as a list of embeddings\n",
    "@retry(wait=wait_random_exponential(min=1, max=40), stop=stop_after_attempt(10))\n",
    "def get_embeddings(input: List):\n",
    "    response = client.embeddings.create(\n",
    "        input=input,\n",
    "        model=EMBEDDING_MODEL\n",
    "    ).data\n",
    "    return [data.embedding for data in response]\n",
    "\n",
    "\n",
    "# Splits an iterable into batches of size n.\n",
    "def batchify(iterable, n=1):\n",
    "    l = len(iterable)\n",
    "    for ndx in range(0, l, n):\n",
    "        yield iterable[ndx : min(ndx + n, l)]\n",
    "     \n",
    "\n",
    "# Function for batching and parallel processing the embeddings\n",
    "def embed_corpus(\n",
    "    corpus: List[str],\n",
    "    batch_size=64,\n",
    "    num_workers=8,\n",
    "    max_context_len=8191,\n",
    "):\n",
    "    # Encode the corpus, truncating to max_context_len\n",
    "    encoding = tiktoken.get_encoding(\"cl100k_base\")\n",
    "    encoded_corpus = [\n",
    "        encoded_article[:max_context_len] for encoded_article in encoding.encode_batch(corpus)\n",
    "    ]\n",
    "\n",
    "    # Calculate corpus statistics: the number of inputs, the total number of tokens, and the estimated cost to embed\n",
    "    num_tokens = sum(len(article) for article in encoded_corpus)\n",
    "    cost_to_embed_tokens = num_tokens / 1000 * EMBEDDING_COST_PER_1K_TOKENS\n",
    "    print(\n",
    "        f\"num_articles={len(encoded_corpus)}, num_tokens={num_tokens}, est_embedding_cost={cost_to_embed_tokens:.2f} USD\"\n",
    "    )\n",
    "\n",
    "    # Embed the corpus\n",
    "    with concurrent.futures.ThreadPoolExecutor(max_workers=num_workers) as executor:\n",
    "        \n",
    "        futures = [\n",
    "            executor.submit(get_embeddings, text_batch)\n",
    "            for text_batch in batchify(encoded_corpus, batch_size)\n",
    "        ]\n",
    "\n",
    "        with tqdm(total=len(encoded_corpus)) as pbar:\n",
    "            for _ in concurrent.futures.as_completed(futures):\n",
    "                pbar.update(batch_size)\n",
    "\n",
    "        embeddings = []\n",
    "        for future in futures:\n",
    "            data = future.result()\n",
    "            embeddings.extend(data)\n",
    "\n",
    "        return embeddings\n",
    "    \n",
    "\n",
    "# Function to generate embeddings for a given column in a DataFrame\n",
    "def generate_embeddings(df, column_name):\n",
    "    # Initialize an empty list to store embeddings\n",
    "    descriptions = df[column_name].astype(str).tolist()\n",
    "    embeddings = embed_corpus(descriptions)\n",
    "\n",
    "    # Add the embeddings as a new column to the DataFrame\n",
    "    df['embeddings'] = embeddings\n",
    "    print(\"Embeddings created successfully.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Two options for creating the embeddings: \n",
    "The next line will **create the embeddings** for the sample clothes dataset. This will take around 0.02s to process and another ~30s to write the results to a local .csv file. The process is using our `text_embedding_3_large` model which is priced at `$0.00013/1K` tokens. Given that the dataset has around `1K` entries, the following operation will cost approximately `$0.001`. If you decide to work with the entire dataset of `44K` entries, this operation will take 2-3min to process and it will cost approximately `$0.07`.\n",
    "\n",
    "**If you would not like to proceed with creating your own embeddings**, we will use a dataset of pre-computed embeddings. You can skip this cell and uncomment the code in the following cell to proceed with loading the pre-computed vectors. This operation takes ~1min to load all the data in memory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "generate_embeddings(styles_df, 'productDisplayName')\n",
    "print(\"Writing embeddings to file ...\")\n",
    "styles_df.to_csv('data/sample_clothes/sample_styles_with_embeddings.csv', index=False)\n",
    "print(\"Embeddings successfully stored in sample_styles_with_embeddings.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# styles_df = pd.read_csv('data/sample_clothes/sample_styles_with_embeddings.csv', on_bad_lines='skip')\n",
    "\n",
    "# # Convert the 'embeddings' column from string representations of lists to actual lists of floats\n",
    "# styles_df['embeddings'] = styles_df['embeddings'].apply(lambda x: ast.literal_eval(x))\n",
    "\n",
    "print(styles_df.head())\n",
    "print(\"Opened dataset successfully. Dataset has {} items of clothing along with their embeddings.\".format(len(styles_df)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Building the Matching Algorithm\n",
    "\n",
    "In this section, we'll develop a cosine similarity retrieval algorithm to find similar items in our dataframe. We'll utilize our custom cosine similarity function for this purpose. While the `sklearn` library offers a built-in cosine similarity function, recent updates to its SDK have led to compatibility issues, prompting us to implement our own standard cosine similarity calculation.\n",
    "\n",
    "If you already have a vector database set up, you can skip this step. Most standard databases come with their own search functions, which simplify the subsequent steps outlined in this guide. However, we aim to demonstrate that the matching algorithm can be tailored to meet specific requirements, such as a particular threshold or a specified number of matches returned.\n",
    "\n",
    "The `find_similar_items` function accepts four parameters:\n",
    "- `embedding`: The embedding for which we want to find a match.\n",
    "- `embeddings`: A list of embeddings to search through for the best matches.\n",
    "- `threshold` (optional): This parameter specifies the minimum similarity score for a match to be considered valid. A higher threshold results in closer (better) matches, while a lower threshold allows for more items to be returned, though they may not be as closely matched to the initial `embedding`.\n",
    "- `top_k` (optional): This parameter determines the number of items to return that exceed the given threshold. These will be the top-scoring matches for the provided `embedding`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def cosine_similarity_manual(vec1, vec2):\n",
    "    \"\"\"Calculate the cosine similarity between two vectors.\"\"\"\n",
    "    vec1 = np.array(vec1, dtype=float)\n",
    "    vec2 = np.array(vec2, dtype=float)\n",
    "\n",
    "\n",
    "    dot_product = np.dot(vec1, vec2)\n",
    "    norm_vec1 = np.linalg.norm(vec1)\n",
    "    norm_vec2 = np.linalg.norm(vec2)\n",
    "    return dot_product / (norm_vec1 * norm_vec2)\n",
    "\n",
    "\n",
    "def find_similar_items(input_embedding, embeddings, threshold=0.5, top_k=2):\n",
    "    \"\"\"Find the most similar items based on cosine similarity.\"\"\"\n",
    "    \n",
    "    # Calculate cosine similarity between the input embedding and all other embeddings\n",
    "    similarities = [(index, cosine_similarity_manual(input_embedding, vec)) for index, vec in enumerate(embeddings)]\n",
    "    \n",
    "    # Filter out any similarities below the threshold\n",
    "    filtered_similarities = [(index, sim) for index, sim in similarities if sim >= threshold]\n",
    "    \n",
    "    # Sort the filtered similarities by similarity score\n",
    "    sorted_indices = sorted(filtered_similarities, key=lambda x: x[1], reverse=True)[:top_k]\n",
    "\n",
    "    # Return the top-k most similar items\n",
    "    return sorted_indices"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def find_matching_items_with_rag(df_items, item_descs):\n",
    "   \"\"\"Take the input item descriptions and find the most similar items based on cosine similarity for each description.\"\"\"\n",
    "   \n",
    "   # Select the embeddings from the DataFrame.\n",
    "   embeddings = df_items['embeddings'].tolist()\n",
    "\n",
    "   \n",
    "   similar_items = []\n",
    "   for desc in item_descs:\n",
    "      \n",
    "      # Generate the embedding for the input item\n",
    "      input_embedding = get_embeddings([desc])\n",
    "    \n",
    "      # Find the most similar items based on cosine similarity\n",
    "      similar_indices = find_similar_items(input_embedding, embeddings, threshold=0.6)\n",
    "      similar_items += [df_items.iloc[i] for i in similar_indices]\n",
    "    \n",
    "   return similar_items"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Analysis Module\n",
    "\n",
    "In this module, we leverage `gpt-4o-mini` to analyze input images and extract important features like detailed descriptions, styles, and types. The analysis is performed through a straightforward API call, where we provide the URL of the image for analysis and request the model to identify relevant features.\n",
    "\n",
    "To ensure the model returns accurate results, we use specific techniques in our prompt:\n",
    "\n",
    "1. **Output Format Specification**: We instruct the model to return a JSON block with a predefined structure, consisting of:\n",
    "   - `items` (str[]): A list of strings, each representing a concise title for an item of clothing, including style, color, and gender. These titles closely resemble the `productDisplayName` property in our original database.\n",
    "   - `category` (str): The category that best represents the given item. The model selects from a list of all unique `articleTypes` present in the original styles dataframe.\n",
    "   - `gender` (str): A label indicating the gender the item is intended for. The model chooses from the options `[Men, Women, Boys, Girls, Unisex]`.\n",
    "\n",
    "2. **Clear and Concise Instructions**: \n",
    "   - We provide clear instructions on what the item titles should include and what the output format should be. The output should be in JSON format, but without the `json` tag that the model response normally contains.\n",
    "\n",
    "3. **One Shot Example**: \n",
    "   - To further clarify the expected output, we provide the model with an example input description and a corresponding example output. Although this may increase the number of tokens used (and thus the cost of the call), it helps to guide the model and results in better overall performance.\n",
    "\n",
    "By following this structured approach, we aim to obtain precise and useful information from the `gpt-4o-mini` model for further analysis and integration into our database."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def analyze_image(image_base64, subcategories):\n",
    "    response = client.chat.completions.create(\n",
    "        model=GPT_MODEL,\n",
    "        messages=[\n",
    "            {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                {\n",
    "                \"type\": \"text\",\n",
    "                \"text\": \"\"\"Given an image of an item of clothing, analyze the item and generate a JSON output with the following fields: \"items\", \"category\", and \"gender\". \n",
    "                           Use your understanding of fashion trends, styles, and gender preferences to provide accurate and relevant suggestions for how to complete the outfit.\n",
    "                           The items field should be a list of items that would go well with the item in the picture. Each item should represent a title of an item of clothing that contains the style, color, and gender of the item.\n",
    "                           The category needs to be chosen between the types in this list: {subcategories}.\n",
    "                           You have to choose between the genders in this list: [Men, Women, Boys, Girls, Unisex]\n",
    "                           Do not include the description of the item in the picture. Do not include the ```json ``` tag in the output.\n",
    "                           \n",
    "                           Example Input: An image representing a black leather jacket.\n",
    "\n",
    "                           Example Output: {\"items\": [\"Fitted White Women's T-shirt\", \"White Canvas Sneakers\", \"Women's Black Skinny Jeans\"], \"category\": \"Jackets\", \"gender\": \"Women\"}\n",
    "                           \"\"\",\n",
    "                },\n",
    "                {\n",
    "                \"type\": \"image_url\",\n",
    "                \"image_url\": {\n",
    "                    \"url\": f\"data:image/jpeg;base64,{image_base64}\",\n",
    "                },\n",
    "                }\n",
    "            ],\n",
    "            }\n",
    "        ]\n",
    "    )\n",
    "    # Extract relevant features from the response\n",
    "    features = response.choices[0].message.content\n",
    "    return features"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Testing the Prompt with Sample Images\n",
    "\n",
    "To evaluate the effectiveness of our prompt, let's load and test it with a selection of images from our dataset. We'll use images from the `\"data/sample_clothes/sample_images\"` folder, ensuring a variety of styles, genders, and types. Here are the chosen samples:\n",
    "\n",
    "- `2133.jpg`: Men's shirt\n",
    "- `7143.jpg`: Women's shirt\n",
    "- `4226.jpg`: Casual men's printed t-shirt\n",
    "\n",
    "By testing the prompt with these diverse images, we can assess its ability to accurately analyze and extract relevant features from different types of clothing items and accessories."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need a utility function to encode the .jpg images in base64"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import base64\n",
    "\n",
    "def encode_image_to_base64(image_path):\n",
    "    with open(image_path, 'rb') as image_file:\n",
    "        encoded_image = base64.b64encode(image_file.read())\n",
    "        return encoded_image.decode('utf-8')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set the path to the images and select a test image\n",
    "image_path = \"data/sample_clothes/sample_images/\"\n",
    "test_images = [\"2133.jpg\", \"7143.jpg\", \"4226.jpg\"]\n",
    "\n",
    "# Encode the test image to base64\n",
    "reference_image = image_path + test_images[0]\n",
    "encoded_image = encode_image_to_base64(reference_image)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Select the unique subcategories from the DataFrame\n",
    "unique_subcategories = styles_df['articleType'].unique()\n",
    "\n",
    "# Analyze the image and return the results\n",
    "analysis = analyze_image(encoded_image, unique_subcategories)\n",
    "image_analysis = json.loads(analysis)\n",
    "\n",
    "# Display the image and the analysis results\n",
    "display(Image(filename=reference_image))\n",
    "print(image_analysis)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we process the output from the image analysis and use it to filter and display matching items from our dataset. Here's a breakdown of the code:\n",
    "\n",
    "1. **Extracting Image Analysis Results**: We extract the item descriptions, category, and gender from the `image_analysis` dictionary.\n",
    "\n",
    "2. **Filtering the Dataset**: We filter the `styles_df` DataFrame to include only items that match the gender from the image analysis (or are unisex) and exclude items of the same category as the analyzed image.\n",
    "\n",
    "3. **Finding Matching Items**: We use the `find_matching_items_with_rag` function to find items in the filtered dataset that match the descriptions extracted from the analyzed image.\n",
    "\n",
    "4. **Displaying Matching Items**: We create an HTML string to display images of the matching items. We construct the image paths using the item IDs and append each image to the HTML string. Finally, we use `display(HTML(html))` to render the images in the notebook.\n",
    "\n",
    "This cell effectively demonstrates how to use the results of image analysis to filter a dataset and visually display items that match the analyzed image's characteristics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Extract the relevant features from the analysis\n",
    "item_descs = image_analysis['items']\n",
    "item_category = image_analysis['category']\n",
    "item_gender = image_analysis['gender']\n",
    "\n",
    "\n",
    "# Filter data such that we only look through the items of the same gender (or unisex) and different category\n",
    "filtered_items = styles_df.loc[styles_df['gender'].isin([item_gender, 'Unisex'])]\n",
    "filtered_items = filtered_items[filtered_items['articleType'] != item_category]\n",
    "print(str(len(filtered_items)) + \" Remaining Items\")\n",
    "\n",
    "# Find the most similar items based on the input item descriptions\n",
    "matching_items = find_matching_items_with_rag(filtered_items, item_descs)\n",
    "\n",
    "# Display the matching items (this will display 2 items for each description in the image analysis)\n",
    "html = \"\"\n",
    "paths = []\n",
    "for i, item in enumerate(matching_items):\n",
    "    item_id = item['id']\n",
    "        \n",
    "    # Path to the image file\n",
    "    image_path = f'data/sample_clothes/sample_images/{item_id}.jpg'\n",
    "    paths.append(image_path)\n",
    "    html += f'<img src=\"{image_path}\" style=\"display:inline;margin:1px\"/>'\n",
    "\n",
    "# Print the matching item description as a reminder of what we are looking for\n",
    "print(item_descs)\n",
    "# Display the image\n",
    "display(HTML(html))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Guardrails\n",
    "\n",
    "In the context of using Large Language Models (LLMs) like GPT-4o mini, \"guardrails\" refer to mechanisms or checks put in place to ensure that the model's output remains within desired parameters or boundaries. These guardrails are crucial for maintaining the quality and relevance of the model's responses, especially when dealing with complex or nuanced tasks.\n",
    "\n",
    "Guardrails are useful for several reasons:\n",
    "\n",
    "1. **Accuracy**: They help ensure that the model's output is accurate and relevant to the input provided.\n",
    "2. **Consistency**: They maintain consistency in the model's responses, especially when dealing with similar or related inputs.\n",
    "3. **Safety**: They prevent the model from generating harmful, offensive, or inappropriate content.\n",
    "4. **Contextual Relevance**: They ensure that the model's output is contextually relevant to the specific task or domain it is being used for.\n",
    "\n",
    "In our case, we are using GPT-4o mini to analyze fashion images and suggest items that would complement an original outfit. To implement guardrails, we can **refine results**: After obtaining initial suggestions from GPT-4o mini, we can send the original image and the suggested items back to the model. We can then ask GPT-4o mini to evaluate whether each suggested item would indeed be a good fit for the original outfit.\n",
    "\n",
    "This gives the model the ability to self-correct and adjust its own output based on feedback or additional information. By implementing these guardrails and enabling self-correction, we can enhance the reliability and usefulness of the model's output in the context of fashion analysis and recommendation.\n",
    "\n",
    "To facilitate this, we write a prompt that asks the LLM for a simple \"yes\" or \"no\" answer to the question of whether the suggested items match the original outfit or not. This binary response helps streamline the refinement process and ensures clear and actionable feedback from the model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def check_match(reference_image_base64, suggested_image_base64):\n",
    "    response = client.chat.completions.create(\n",
    "        model=GPT_MODEL,\n",
    "        messages=[\n",
    "            {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                {\n",
    "                \"type\": \"text\",\n",
    "                \"text\": \"\"\" You will be given two images of two different items of clothing.\n",
    "                            Your goal is to decide if the items in the images would work in an outfit together.\n",
    "                            The first image is the reference item (the item that the user is trying to match with another item).\n",
    "                            You need to decide if the second item would work well with the reference item.\n",
    "                            Your response must be a JSON output with the following fields: \"answer\", \"reason\".\n",
    "                            The \"answer\" field must be either \"yes\" or \"no\", depending on whether you think the items would work well together.\n",
    "                            The \"reason\" field must be a short explanation of your reasoning for your decision. Do not include the descriptions of the 2 images.\n",
    "                            Do not include the ```json ``` tag in the output.\n",
    "                           \"\"\",\n",
    "                },\n",
    "                {\n",
    "                \"type\": \"image_url\",\n",
    "                \"image_url\": {\n",
    "                    \"url\": f\"data:image/jpeg;base64,{reference_image_base64}\",\n",
    "                },\n",
    "                },\n",
    "                {\n",
    "                \"type\": \"image_url\",\n",
    "                \"image_url\": {\n",
    "                    \"url\": f\"data:image/jpeg;base64,{suggested_image_base64}\",\n",
    "                },\n",
    "                }\n",
    "            ],\n",
    "            }\n",
    "        ],\n",
    "        max_tokens=300,\n",
    "    )\n",
    "    # Extract relevant features from the response\n",
    "    features = response.choices[0].message.content\n",
    "    return features"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, let's determine which of the items identified above truly complement the outfit."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Select the unique paths for the generated images\n",
    "paths = list(set(paths))\n",
    "\n",
    "for path in paths:\n",
    "    # Encode the test image to base64\n",
    "    suggested_image = encode_image_to_base64(path)\n",
    "    \n",
    "    # Check if the items match\n",
    "    match = json.loads(check_match(encoded_image, suggested_image))\n",
    "    \n",
    "    # Display the image and the analysis results\n",
    "    if match[\"answer\"] == 'yes':\n",
    "        display(Image(filename=path))\n",
    "        print(\"The items match!\")\n",
    "        print(match[\"reason\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can observe that the initial list of potential items has been further refined, resulting in a more curated selection that aligns well with the outfit. Additionally, the model provides explanations for why each item is considered a good match, offering valuable insights into the decision-making process."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conclusion\n",
    "\n",
    "In this Jupyter Notebook, we explored the application of GPT-4o mini and other machine learning techniques to the domain of fashion. We demonstrated how to analyze images of clothing items, extract relevant features, and use this information to find matching items that complement an original outfit. Through the implementation of guardrails and self-correction mechanisms, we refined the model's suggestions to ensure they are accurate and contextually relevant.\n",
    "\n",
    "This approach has several practical uses in the real world, including:\n",
    "\n",
    "1. **Personalized Shopping Assistants**: Retailers can use this technology to offer personalized outfit recommendations to customers, enhancing the shopping experience and increasing customer satisfaction.\n",
    "2. **Virtual Wardrobe Applications**: Users can upload images of their own clothing items to create a virtual wardrobe and receive suggestions for new items that match their existing pieces.\n",
    "3. **Fashion Design and Styling**: Fashion designers and stylists can use this tool to experiment with different combinations and styles, streamlining the creative process.\n",
    "\n",
    "However, one of the considerations to keep in mind is **cost**. The use of LLMs and image analysis models can incur costs, especially if used extensively. It's important to consider the cost-effectiveness of implementing these technologies. `gpt-4o-mini` is priced at `$0.01` per 1000 tokens. This adds up to `$0.00255` for one 256px x 256px image.\n",
    "\n",
    "Overall, this notebook serves as a foundation for further exploration and development in the intersection of fashion and AI, opening doors to more personalized and intelligent fashion recommendation systems."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}