You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
125 lines
3.7 KiB
Plaintext
125 lines
3.7 KiB
Plaintext
1 year ago
|
{
|
||
|
"cells": [
|
||
|
{
|
||
12 months ago
|
"attachments": {},
|
||
1 year ago
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
9 months ago
|
"## Visualizing embeddings in W&B\n",
|
||
1 year ago
|
"\n",
|
||
9 months ago
|
"We will upload the data to [Weights & Biases](http://wandb.ai) and use an [Embedding Projector](https://docs.wandb.ai/ref/app/features/panels/weave/embedding-projector) to visualize the embeddings using common dimension reduction algorithms like PCA, UMAP, and t-SNE. The dataset is created in the [Get_embeddings_from_dataset Notebook](Get_embeddings_from_dataset.ipynb)."
|
||
1 year ago
|
]
|
||
|
},
|
||
1 year ago
|
{
|
||
|
"attachments": {},
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"## What is Weights & Biases?\n",
|
||
|
"\n",
|
||
1 year ago
|
"[Weights & Biases](http://wandb.ai) is a machine learning platform used by OpenAI and other ML teams to build better models faster. They use it to quickly track experiments, evaluate model performance, reproduce models, visualize results, and share findings with colleagues."
|
||
1 year ago
|
]
|
||
|
},
|
||
1 year ago
|
{
|
||
12 months ago
|
"attachments": {},
|
||
1 year ago
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### 1. Log the data to W&B\n",
|
||
|
"\n",
|
||
|
"We create a [W&B Table](https://docs.wandb.ai/guides/data-vis/log-tables) with the original data and the embeddings. Each review is a new row and the 1536 embedding floats are given their own column named `emb_{i}`."
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 2,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import pandas as pd\n",
|
||
|
"from sklearn.manifold import TSNE\n",
|
||
|
"import numpy as np\n",
|
||
12 months ago
|
"from ast import literal_eval\n",
|
||
1 year ago
|
"\n",
|
||
|
"# Load the embeddings\n",
|
||
|
"datafile_path = \"data/fine_food_reviews_with_embeddings_1k.csv\"\n",
|
||
|
"df = pd.read_csv(datafile_path)\n",
|
||
|
"\n",
|
||
|
"# Convert to a list of lists of floats\n",
|
||
12 months ago
|
"matrix = np.array(df.embedding.apply(literal_eval).to_list())"
|
||
1 year ago
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import wandb\n",
|
||
|
"\n",
|
||
|
"original_cols = df.columns[1:-1].tolist()\n",
|
||
|
"embedding_cols = ['emb_'+str(idx) for idx in range(len(matrix[0]))]\n",
|
||
|
"table_cols = original_cols + embedding_cols\n",
|
||
|
"\n",
|
||
|
"with wandb.init(project='openai_embeddings'):\n",
|
||
|
" table = wandb.Table(columns=table_cols)\n",
|
||
|
" for i, row in enumerate(df.to_dict(orient=\"records\")):\n",
|
||
|
" original_data = [row[col_name] for col_name in original_cols]\n",
|
||
|
" embedding_data = matrix[i].tolist()\n",
|
||
|
" table.add_data(*(original_data + embedding_data))\n",
|
||
|
" wandb.log({'openai_embedding_table': table})"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
12 months ago
|
"attachments": {},
|
||
1 year ago
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"### 2. Render as 2D Projection"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
12 months ago
|
"attachments": {},
|
||
1 year ago
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"After navigating to the W&B run link, we click the ⚙️ icon in the top right of the Table and change \"Render As:\" to \"Combined 2D Projection\". "
|
||
|
]
|
||
|
},
|
||
|
{
|
||
12 months ago
|
"attachments": {},
|
||
1 year ago
|
"cell_type": "markdown",
|
||
|
"metadata": {},
|
||
|
"source": [
|
||
|
"Example: http://wandb.me/openai_embeddings"
|
||
|
]
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 3 (ipykernel)",
|
||
|
"language": "python",
|
||
|
"name": "python3"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 3
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython3",
|
||
|
"version": "3.9.15"
|
||
|
},
|
||
|
"vscode": {
|
||
|
"interpreter": {
|
||
|
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
|
||
|
}
|
||
|
}
|
||
|
},
|
||
|
"nbformat": 4,
|
||
|
"nbformat_minor": 4
|
||
|
}
|