2023-05-20 01:15:24 +00:00
|
|
|
# Embeddings Playground
|
|
|
|
|
|
|
|
[`embeddings_playground.py`](embeddings_playground.py) is a single-page streamlit app for experimenting with OpenAI embeddings.
|
|
|
|
|
|
|
|
## Installation
|
|
|
|
|
|
|
|
Before running, install required dependencies with:
|
|
|
|
|
2023-05-22 23:41:21 +00:00
|
|
|
`pip install -r apps/embeddings-playground/requirements.txt`
|
2023-05-20 01:15:24 +00:00
|
|
|
|
|
|
|
(You may need to change the path to match your local path.)
|
|
|
|
|
|
|
|
Verify installation of streamlit with `streamlit hello`.
|
|
|
|
|
|
|
|
## Usage
|
|
|
|
|
|
|
|
Run the script with:
|
|
|
|
|
2023-05-22 23:41:21 +00:00
|
|
|
`streamlit run apps/embeddings-playground/embeddings_playground.py`
|
2023-05-20 01:15:24 +00:00
|
|
|
|
|
|
|
(Again, you may need to change the path to match your local path.)
|
|
|
|
|
|
|
|
In the app, first select your choice of:
|
|
|
|
- distance metric (we recommend cosine)
|
|
|
|
- embedding model (we recommend `text-embedding-ada-002` for most use cases, as of May 2023)
|
|
|
|
|
|
|
|
Then, enter a variable number of strings to compare. Click `rank` to see:
|
|
|
|
- the ranked list of strings, sorted by distance from the first string
|
|
|
|
- a heatmap showing the distance between each pair of strings
|
|
|
|
|
|
|
|
## Example
|
|
|
|
|
|
|
|
Here's an example distance matrix for 8 example strings related to `The sky is blue`:
|
|
|
|
|
|
|
|
![example distance matrix](example_distance_matrix.png)
|
|
|
|
|
|
|
|
From these distance pairs, you can see:
|
|
|
|
- embeddings measure topical similarity more than logical similarity (e.g., `The sky is blue` is very close to `The sky is not blue`)
|
|
|
|
- punctuation affects embeddings (e.g., `"THE. SKY. IS. BLUE!"` is only third closest to `The sky is blue`)
|
|
|
|
- within-language pairs are stronger than across-language pairs (e.g., `El cielo as azul` is closer to `El cielo es rojo` than to `The sky is blue`)
|
|
|
|
|
|
|
|
Experiment with your own strings to see what you can learn.
|