Merge pull request #100 from scottire/main

Add W&B embedding projector example
1 year ago · 5b5f228121
parent c31fe72f1a f4c865cfef
commit 5b5f228121
1 changed files with 118 additions and 0 deletions
--- a/examples/Visualizing_embeddings_in_W&B.ipynb
+++ b/examples/Visualizing_embeddings_in_W&B.ipynb
@ -0,0 +1,118 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Visualizing the embeddings in W&B\n",
+    "\n",
+    "We will upload the data to [Weights & Biases](http://wandb.ai) and use an [Embedding Projector](https://docs.wandb.ai/ref/app/features/panels/weave/embedding-projector) to visualize the embeddings using common dimension reduction algorithms like PCA, UMAP, and t-SNE. The dataset is created in the [Obtain_dataset Notebook](Obtain_dataset.ipynb)."
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## What is Weights & Biases?\n",
+    "\n",
+    "[Weights & Biases](http://wandb.ai) is a machine learning platform used by OpenAI and other ML teams to build better models faster. They use it to quickly track experiments, evaluate model performance, reproduce models, visualize results, and share findings with colleagues."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 1. Log the data to W&B\n",
+    "\n",
+    "We create a [W&B Table](https://docs.wandb.ai/guides/data-vis/log-tables) with the original data and the embeddings. Each review is a new row and the 1536 embedding floats are given their own column named `emb_{i}`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "from sklearn.manifold import TSNE\n",
+    "import numpy as np\n",
+    "\n",
+    "# Load the embeddings\n",
+    "datafile_path = \"data/fine_food_reviews_with_embeddings_1k.csv\"\n",
+    "df = pd.read_csv(datafile_path)\n",
+    "\n",
+    "# Convert to a list of lists of floats\n",
+    "matrix = np.array(df.embedding.apply(eval).to_list())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import wandb\n",
+    "\n",
+    "original_cols = df.columns[1:-1].tolist()\n",
+    "embedding_cols = ['emb_'+str(idx) for idx in range(len(matrix[0]))]\n",
+    "table_cols = original_cols + embedding_cols\n",
+    "\n",
+    "with wandb.init(project='openai_embeddings'):\n",
+    "    table = wandb.Table(columns=table_cols)\n",
+    "    for i, row in enumerate(df.to_dict(orient=\"records\")):\n",
+    "        original_data = [row[col_name] for col_name in original_cols]\n",
+    "        embedding_data = matrix[i].tolist()\n",
+    "        table.add_data(*(original_data + embedding_data))\n",
+    "    wandb.log({'openai_embedding_table': table})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 2. Render as 2D Projection"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After navigating to the W&B run link, we click the ⚙️ icon in the top right of the Table and change \"Render As:\" to \"Combined 2D Projection\". "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Example: http://wandb.me/openai_embeddings"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.15"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}