{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Recommendation using embeddings and nearest neighbor search\n", "\n", "Recommendations are widespread across the web.\n", "\n", "- 'Bought that item? Try these similar items.'\n", "- 'Enjoy that book? Try these similar titles.'\n", "- 'Not the help page you were looking for? Try these similar pages.'\n", "\n", "This notebook demonstrates how to use embeddings to find similar items to recommend. In particular, we use [AG's corpus of news articles](http://groups.di.unipi.it/~gulli/AG_corpus_of_news_articles.html) as our dataset.\n", "\n", "Our model will answer the question: given an article, what other articles are most similar to it?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. Imports\n", "\n", "First, let's import the packages and functions we'll need for later. If you don't have these, you'll need to install them. You can install them via your terminal by running `pip install {package_name}`, e.g. `pip install pandas`." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# imports\n", "import pandas as pd\n", "import pickle\n", "\n", "from openai.embeddings_utils import (\n", " get_embedding,\n", " distances_from_embeddings,\n", " tsne_components_from_embeddings,\n", " chart_from_components,\n", " indices_of_nearest_neighbors_from_distances,\n", ")\n", "\n", "# constants\n", "EMBEDDING_MODEL = \"text-embedding-ada-002\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. Load data\n", "\n", "Next, let's load the AG news data and see what it looks like." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | title | \n", "description | \n", "label_int | \n", "label | \n", "
---|---|---|---|---|
0 | \n", "World Briefings | \n", "BRITAIN: BLAIR WARNS OF CLIMATE THREAT Prime M... | \n", "1 | \n", "World | \n", "
1 | \n", "Nvidia Puts a Firewall on a Motherboard (PC Wo... | \n", "PC World - Upcoming chip set will include buil... | \n", "4 | \n", "Sci/Tech | \n", "
2 | \n", "Olympic joy in Greek, Chinese press | \n", "Newspapers in Greece reflect a mixture of exhi... | \n", "2 | \n", "Sports | \n", "
3 | \n", "U2 Can iPod with Pictures | \n", "SAN JOSE, Calif. -- Apple Computer (Quote, Cha... | \n", "4 | \n", "Sci/Tech | \n", "
4 | \n", "The Dream Factory | \n", "Any product, any shape, any size -- manufactur... | \n", "4 | \n", "Sci/Tech | \n", "