openai-cookbook/apps/embeddings-playground/README.md

42 lines
1.6 KiB
Markdown
Raw Normal View History

# Embeddings Playground
[`embeddings_playground.py`](embeddings_playground.py) is a single-page streamlit app for experimenting with OpenAI embeddings.
## Installation
Before running, install required dependencies with:
`pip install -r apps/embeddings-playground/requirements.txt`
(You may need to change the path to match your local path.)
Verify installation of streamlit with `streamlit hello`.
## Usage
Run the script with:
`streamlit run apps/embeddings-playground/embeddings_playground.py`
(Again, you may need to change the path to match your local path.)
In the app, first select your choice of:
- distance metric (we recommend cosine)
- embedding model (we recommend `text-embedding-ada-002` for most use cases, as of May 2023)
Then, enter a variable number of strings to compare. Click `rank` to see:
- the ranked list of strings, sorted by distance from the first string
- a heatmap showing the distance between each pair of strings
## Example
Here's an example distance matrix for 8 example strings related to `The sky is blue`:
![example distance matrix](example_distance_matrix.png)
From these distance pairs, you can see:
- embeddings measure topical similarity more than logical similarity (e.g., `The sky is blue` is very close to `The sky is not blue`)
- punctuation affects embeddings (e.g., `"THE. SKY. IS. BLUE!"` is only third closest to `The sky is blue`)
- within-language pairs are stronger than across-language pairs (e.g., `El cielo as azul` is closer to `El cielo es rojo` than to `The sky is blue`)
Experiment with your own strings to see what you can learn.