"Welcome to the Clothing Matchmaker App Jupyter Notebook! This project demonstrates the power of the GPT-4o model in analyzing images of clothing items and extracting key features such as color, style, and type. The core of our app relies on this advanced image analysis model developed by OpenAI, which enables us to accurately identify the characteristics of the input clothing item.\n",
"GPT-4o is a model that combines natural language processing with image recognition, allowing it to understand and generate responses based on both text and visual inputs.\n",
"Building on the capabilities of the GPT-4o model, we employ a custom matching algorithm and the RAG technique to search our knowledge base for items that complement the identified features. This algorithm takes into account factors like color compatibility and style coherence to provide users with suitable recommendations. Through this notebook, we aim to showcase the practical application of these technologies in creating a clothing recommendation system.\n",
"1. **Contextual Understanding**: GPT-4o can analyze input images and understand the context, such as the objects, scenes, and activities depicted. This allows for more accurate and relevant suggestions or information across various domains, whether it's interior design, cooking, or education.\n",
"2. **Rich Knowledge Base**: RAG combines the generative capabilities of GPT-4 with a retrieval component that accesses a large corpus of information across different fields. This means the system can provide suggestions or insights based on a wide range of knowledge, from historical facts to scientific concepts.\n",
"3. **Customization**: The approach allows for easy customization to cater to specific user needs or preferences in various applications. Whether it's tailoring suggestions to a user's taste in art or providing educational content based on a student's learning level, the system can be adapted to deliver personalized experiences.\n",
"Overall, the GPT-4o + RAG approach offers a powerful and flexible solution for various fashion-related applications, leveraging the strengths of both generative and retrieval-based AI techniques."
"We will now set up the knowledge base by choosing a database and generating embeddings for it. I am using the `sample_styles.csv` file for this in the data folder. This is a sample of a bigger dataset that contains `~44K` items. This step can also be replaced by using an out-of-the-box vector database. For example, you can follow one of [these cookbooks](https://github.com/openai/openai-cookbook/tree/main/examples/vector_databases) to set up your vector database."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" id gender masterCategory subCategory articleType baseColour season \\\n",
"0 27152 Men Apparel Topwear Shirts Blue Summer \n",
"1 10469 Men Apparel Topwear Tshirts Yellow Fall \n",
"2 17169 Men Apparel Topwear Shirts Maroon Fall \n",
"3 56702 Men Apparel Topwear Kurtas Blue Summer \n",
"4 47062 Women Apparel Bottomwear Patiala Multi Fall \n",
"\n",
" year usage productDisplayName \n",
"0 2012.0 Formal Mark Taylor Men Striped Blue Shirt \n",
"1 2011.0 Casual Flying Machine Men Yellow Polo Tshirts \n",
"2 2011.0 Casual U.S. Polo Assn. Men Checks Maroon Shirt \n",
"3 2012.0 Ethnic Fabindia Men Blue Kurta \n",
"4 2012.0 Ethnic Shree Women Multi Colored Patiala \n",
"Opened dataset successfully. Dataset has 1000 items of clothing.\n"
"print(\"Opened dataset successfully. Dataset has {} items of clothing.\".format(len(styles_df)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we will generate embeddings for the entire dataset. We can parallelize the execution of these embeddings to ensure that the script scales up for larger datasets. With this logic, the time to create embeddings for the full `44K` entry dataset decreases from ~4h to ~2-3min. "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"## Batch Embedding Logic\n",
"\n",
"# Simple function to take in a list of text objects and return them as a list of embeddings\n",
" # Add the embeddings as a new column to the DataFrame\n",
" df['embeddings'] = embeddings\n",
" print(\"Embeddings created successfully.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Two options for creating the embeddings: \n",
"The next line will **create the embeddings** for the sample clothes dataset. This will take around 0.02s to process and another ~30s to write the results to a local .csv file. The process is using our `text_embedding_3_large` model which is priced at `$0.00013/1K` tokens. Given that the dataset has around `1K` entries, the following operation will cost approximately `$0.001`. If you decide to work with the entire dataset of `44K` entries, this operation will take 2-3min to process and it will cost approximately `$0.07`.\n",
"\n",
"**If you would not like to proceed with creating your own embeddings**, we will use a dataset of pre-computed embeddings. You can skip this cell and uncomment the code in the following cell to proceed with loading the pre-computed vectors. This operation takes ~1min to load all the data in memory."
"print(\"Opened dataset successfully. Dataset has {} items of clothing along with their embeddings.\".format(len(styles_df)))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Building the Matching Algorithm\n",
"\n",
"In this section, we'll develop a cosine similarity retrieval algorithm to find similar items in our dataframe. We'll utilize our custom cosine similarity function for this purpose. While the `sklearn` library offers a built-in cosine similarity function, recent updates to its SDK have led to compatibility issues, prompting us to implement our own standard cosine similarity calculation.\n",
"\n",
"If you already have a vector database set up, you can skip this step. Most standard databases come with their own search functions, which simplify the subsequent steps outlined in this guide. However, we aim to demonstrate that the matching algorithm can be tailored to meet specific requirements, such as a particular threshold or a specified number of matches returned.\n",
"\n",
"The `find_similar_items` function accepts four parameters:\n",
"- `embedding`: The embedding for which we want to find a match.\n",
"- `embeddings`: A list of embeddings to search through for the best matches.\n",
"- `threshold` (optional): This parameter specifies the minimum similarity score for a match to be considered valid. A higher threshold results in closer (better) matches, while a lower threshold allows for more items to be returned, though they may not be as closely matched to the initial `embedding`.\n",
"- `top_k` (optional): This parameter determines the number of items to return that exceed the given threshold. These will be the top-scoring matches for the provided `embedding`."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"def cosine_similarity_manual(vec1, vec2):\n",
" \"\"\"Calculate the cosine similarity between two vectors.\"\"\"\n",
"In this module, we leverage `gpt-4o` to analyze input images and extract important features like detailed descriptions, styles, and types. The analysis is performed through a straightforward API call, where we provide the URL of the image for analysis and request the model to identify relevant features.\n",
"To ensure the model returns accurate results, we use specific techniques in our prompt:\n",
"\n",
"1. **Output Format Specification**: We instruct the model to return a JSON block with a predefined structure, consisting of:\n",
" - `items` (str[]): A list of strings, each representing a concise title for an item of clothing, including style, color, and gender. These titles closely resemble the `productDisplayName` property in our original database.\n",
" - `category` (str): The category that best represents the given item. The model selects from a list of all unique `articleTypes` present in the original styles dataframe.\n",
" - `gender` (str): A label indicating the gender the item is intended for. The model chooses from the options `[Men, Women, Boys, Girls, Unisex]`.\n",
"\n",
"2. **Clear and Concise Instructions**: \n",
" - We provide clear instructions on what the item titles should include and what the output format should be. The output should be in JSON format, but without the `json` tag that the model response normally contains.\n",
"\n",
"3. **One Shot Example**: \n",
" - To further clarify the expected output, we provide the model with an example input description and a corresponding example output. Although this may increase the number of tokens used (and thus the cost of the call), it helps to guide the model and results in better overall performance.\n",
"By following this structured approach, we aim to obtain precise and useful information from the `gpt-4o` model for further analysis and integration into our database."
" \"text\": \"\"\"Given an image of an item of clothing, analyze the item and generate a JSON output with the following fields: \"items\", \"category\", and \"gender\". \n",
" Use your understanding of fashion trends, styles, and gender preferences to provide accurate and relevant suggestions for how to complete the outfit.\n",
" The items field should be a list of items that would go well with the item in the picture. Each item should represent a title of an item of clothing that contains the style, color, and gender of the item.\n",
" The category needs to be chosen between the types in this list: {subcategories}.\n",
" You have to choose between the genders in this list: [Men, Women, Boys, Girls, Unisex]\n",
" Do not include the description of the item in the picture. Do not include the ```json ``` tag in the output.\n",
" \n",
" Example Input: An image representing a black leather jacket.\n",
"\n",
" Example Output: {\"items\": [\"Fitted White Women's T-shirt\", \"White Canvas Sneakers\", \"Women's Black Skinny Jeans\"], \"category\": \"Jackets\", \"gender\": \"Women\"}\n",
" # Extract relevant features from the response\n",
" features = response.choices[0].message.content\n",
" return features"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Testing the Prompt with Sample Images\n",
"\n",
"To evaluate the effectiveness of our prompt, let's load and test it with a selection of images from our dataset. We'll use images from the `\"data/sample_clothes/sample_images\"` folder, ensuring a variety of styles, genders, and types. Here are the chosen samples:\n",
"\n",
"- `2133.jpg`: Men's shirt\n",
"- `7143.jpg`: Women's shirt\n",
"- `4226.jpg`: Casual men's printed t-shirt\n",
"\n",
"By testing the prompt with these diverse images, we can assess its ability to accurately analyze and extract relevant features from different types of clothing items and accessories."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We need a utility function to encode the .jpg images in base64"
"Next, we process the output from the image analysis and use it to filter and display matching items from our dataset. Here's a breakdown of the code:\n",
"\n",
"1. **Extracting Image Analysis Results**: We extract the item descriptions, category, and gender from the `image_analysis` dictionary.\n",
"\n",
"2. **Filtering the Dataset**: We filter the `styles_df` DataFrame to include only items that match the gender from the image analysis (or are unisex) and exclude items of the same category as the analyzed image.\n",
"\n",
"3. **Finding Matching Items**: We use the `find_matching_items_with_rag` function to find items in the filtered dataset that match the descriptions extracted from the analyzed image.\n",
"\n",
"4. **Displaying Matching Items**: We create an HTML string to display images of the matching items. We construct the image paths using the item IDs and append each image to the HTML string. Finally, we use `display(HTML(html))` to render the images in the notebook.\n",
"\n",
"This cell effectively demonstrates how to use the results of image analysis to filter a dataset and visually display items that match the analyzed image's characteristics."
"In the context of using Large Language Models (LLMs) like GPT-4o, \"guardrails\" refer to mechanisms or checks put in place to ensure that the model's output remains within desired parameters or boundaries. These guardrails are crucial for maintaining the quality and relevance of the model's responses, especially when dealing with complex or nuanced tasks.\n",
"In our case, we are using GPT-4o to analyze fashion images and suggest items that would complement an original outfit. To implement guardrails, we can **refine results**: After obtaining initial suggestions from GPT-4o, we can send the original image and the suggested items back to the model. We can then ask GPT-4o to evaluate whether each suggested item would indeed be a good fit for the original outfit.\n",
"This gives the model the ability to self-correct and adjust its own output based on feedback or additional information. By implementing these guardrails and enabling self-correction, we can enhance the reliability and usefulness of the model's output in the context of fashion analysis and recommendation.\n",
"\n",
"To facilitate this, we write a prompt that asks the LLM for a simple \"yes\" or \"no\" answer to the question of whether the suggested items match the original outfit or not. This binary response helps streamline the refinement process and ensures clear and actionable feedback from the model."
"The black shirt and the white sneakers with red and beige accents can work well together. Black is a versatile color that pairs well with many shoe options, and the white sneakers can add a sporty and casual touch to the outfit.\n"
" match = json.loads(check_match(encoded_image, suggested_image))\n",
" \n",
" # Display the image and the analysis results\n",
" if match[\"answer\"] == 'yes':\n",
" display(Image(filename=path))\n",
" print(\"The items match!\")\n",
" print(match[\"reason\"])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can observe that the initial list of potential items has been further refined, resulting in a more curated selection that aligns well with the outfit. Additionally, the model provides explanations for why each item is considered a good match, offering valuable insights into the decision-making process."
"In this Jupyter Notebook, we explored the application of GPT-4o and other machine learning techniques to the domain of fashion. We demonstrated how to analyze images of clothing items, extract relevant features, and use this information to find matching items that complement an original outfit. Through the implementation of guardrails and self-correction mechanisms, we refined the model's suggestions to ensure they are accurate and contextually relevant.\n",
"This approach has several practical uses in the real world, including:\n",
"\n",
"1. **Personalized Shopping Assistants**: Retailers can use this technology to offer personalized outfit recommendations to customers, enhancing the shopping experience and increasing customer satisfaction.\n",
"2. **Virtual Wardrobe Applications**: Users can upload images of their own clothing items to create a virtual wardrobe and receive suggestions for new items that match their existing pieces.\n",
"3. **Fashion Design and Styling**: Fashion designers and stylists can use this tool to experiment with different combinations and styles, streamlining the creative process.\n",
"However, one of the considerations to keep in mind is **cost**. The use of LLMs and image analysis models can incur costs, especially if used extensively. It's important to consider the cost-effectiveness of implementing these technologies.\n",
"Overall, this notebook serves as a foundation for further exploration and development in the intersection of fashion and AI, opening doors to more personalized and intelligent fashion recommendation systems."