From c22c0481211ffe08738fd196b5f46b8a33e353eb Mon Sep 17 00:00:00 2001 From: Scott Date: Wed, 1 Feb 2023 13:52:08 +0000 Subject: [PATCH 1/3] Add W&B embedding projector example --- examples/Visualizing_embeddings_in_W&B.ipynb | 108 +++++++++++++++++++ 1 file changed, 108 insertions(+) create mode 100644 examples/Visualizing_embeddings_in_W&B.ipynb diff --git a/examples/Visualizing_embeddings_in_W&B.ipynb b/examples/Visualizing_embeddings_in_W&B.ipynb new file mode 100644 index 00000000..3b11325e --- /dev/null +++ b/examples/Visualizing_embeddings_in_W&B.ipynb @@ -0,0 +1,108 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Visualizing the embeddings in W&B\n", + "\n", + "We will upload the data to [Weights & Biases](http://wandb.ai) and use an [Embedding Projector](https://docs.wandb.ai/ref/app/features/panels/weave/embedding-projector) to visualize the embeddings using common dimension reduction algorithms like PCA, UMAP, and t-SNE. The dataset is created in the [Obtain_dataset Notebook](Obtain_dataset.ipynb)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1. Log the data to W&B\n", + "\n", + "We create a [W&B Table](https://docs.wandb.ai/guides/data-vis/log-tables) with the original data and the embeddings. Each review is a new row and the 1536 embedding floats are given their own column named `emb_{i}`." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "from sklearn.manifold import TSNE\n", + "import numpy as np\n", + "\n", + "# Load the embeddings\n", + "datafile_path = \"data/fine_food_reviews_with_embeddings_1k.csv\"\n", + "df = pd.read_csv(datafile_path)\n", + "\n", + "# Convert to a list of lists of floats\n", + "matrix = np.array(df.embedding.apply(eval).to_list())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import wandb\n", + "\n", + "original_cols = df.columns[1:-1].tolist()\n", + "embedding_cols = ['emb_'+str(idx) for idx in range(len(matrix[0]))]\n", + "table_cols = original_cols + embedding_cols\n", + "\n", + "with wandb.init(project='openai_embeddings'):\n", + " table = wandb.Table(columns=table_cols)\n", + " for i, row in enumerate(df.to_dict(orient=\"records\")):\n", + " original_data = [row[col_name] for col_name in original_cols]\n", + " embedding_data = matrix[i].tolist()\n", + " table.add_data(*(original_data + embedding_data))\n", + " wandb.log({'openai_embedding_table': table})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2. Render as 2D Projection" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After navigating to the W&B run link, we click the ⚙️ icon in the top right of the Table and change \"Render As:\" to \"Combined 2D Projection\". " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Example: http://wandb.me/openai_embeddings" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.15" + }, + "vscode": { + "interpreter": { + "hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97" + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} From 20bff60b9835f336fa3f0a2e5b722432165e42af Mon Sep 17 00:00:00 2001 From: Scott Condron Date: Thu, 9 Feb 2023 15:39:03 +0000 Subject: [PATCH 2/3] Add W&B Blurb --- examples/Visualizing_embeddings_in_W&B.ipynb | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/examples/Visualizing_embeddings_in_W&B.ipynb b/examples/Visualizing_embeddings_in_W&B.ipynb index 3b11325e..0056ffe3 100644 --- a/examples/Visualizing_embeddings_in_W&B.ipynb +++ b/examples/Visualizing_embeddings_in_W&B.ipynb @@ -9,6 +9,16 @@ "We will upload the data to [Weights & Biases](http://wandb.ai) and use an [Embedding Projector](https://docs.wandb.ai/ref/app/features/panels/weave/embedding-projector) to visualize the embeddings using common dimension reduction algorithms like PCA, UMAP, and t-SNE. The dataset is created in the [Obtain_dataset Notebook](Obtain_dataset.ipynb)." ] }, + { + "attachments": {}, + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What is Weights & Biases?\n", + "\n", + "[Weights & Biases](http://wandb.ai) is a machine learning platform used by OpenAI and other top ML teams to build better models faster. They use it to quickly track experiments, evaluate model performance, reproduce models, visualize results, and share findings with colleagues." + ] + }, { "cell_type": "markdown", "metadata": {}, From 221f5de9f82df0861378ac4f0b18ea930fdcd055 Mon Sep 17 00:00:00 2001 From: Scott Condron Date: Thu, 9 Feb 2023 18:33:15 +0000 Subject: [PATCH 3/3] Removing top from blurb --- examples/Visualizing_embeddings_in_W&B.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/Visualizing_embeddings_in_W&B.ipynb b/examples/Visualizing_embeddings_in_W&B.ipynb index 0056ffe3..da5bae55 100644 --- a/examples/Visualizing_embeddings_in_W&B.ipynb +++ b/examples/Visualizing_embeddings_in_W&B.ipynb @@ -16,7 +16,7 @@ "source": [ "## What is Weights & Biases?\n", "\n", - "[Weights & Biases](http://wandb.ai) is a machine learning platform used by OpenAI and other top ML teams to build better models faster. They use it to quickly track experiments, evaluate model performance, reproduce models, visualize results, and share findings with colleagues." + "[Weights & Biases](http://wandb.ai) is a machine learning platform used by OpenAI and other ML teams to build better models faster. They use it to quickly track experiments, evaluate model performance, reproduce models, visualize results, and share findings with colleagues." ] }, {