{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [] }, "source": [ "# Embedding Distance\n", "[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/extras/guides/evaluation/string/embedding_distance.ipynb)\n", "\n", "To measure semantic similarity (or dissimilarity) between a prediction and a reference label string, you could use a vector vector distance metric the two embedded representations using the `embedding_distance` evaluator.[[1]](#cite_note-1)\n", "\n", "\n", "**Note:** This returns a **distance** score, meaning that the lower the number, the **more** similar the prediction is to the reference, according to their embedded representation.\n", "\n", "Check out the reference docs for the [EmbeddingDistanceEvalChain](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.embedding_distance.base.EmbeddingDistanceEvalChain.html#langchain.evaluation.embedding_distance.base.EmbeddingDistanceEvalChain) for more info." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.evaluation import load_evaluator\n", "\n", "evaluator = load_evaluator(\"embedding_distance\")" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'score': 0.0966466944859925}" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "evaluator.evaluate_strings(prediction=\"I shall go\", reference=\"I shan't go\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'score': 0.03761174337464557}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "evaluator.evaluate_strings(prediction=\"I shall go\", reference=\"I will go\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Select the Distance Metric\n", "\n", "By default, the evalutor uses cosine distance. You can choose a different distance metric if you'd like. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "[,\n", " ,\n", " ,\n", " ,\n", " ]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from langchain.evaluation import EmbeddingDistance\n", "\n", "list(EmbeddingDistance)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [] }, "outputs": [], "source": [ "# You can load by enum or by raw python string\n", "evaluator = load_evaluator(\n", " \"embedding_distance\", distance_metric=EmbeddingDistance.EUCLIDEAN\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Select Embeddings to Use\n", "\n", "The constructor uses `OpenAI` embeddings by default, but you can configure this however you want. Below, use huggingface local embeddings" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.embeddings import HuggingFaceEmbeddings\n", "\n", "embedding_model = HuggingFaceEmbeddings()\n", "hf_evaluator = load_evaluator(\"embedding_distance\", embeddings=embedding_model)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'score': 0.5486443280477362}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hf_evaluator.evaluate_strings(prediction=\"I shall go\", reference=\"I shan't go\")" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'score': 0.21018880025138598}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "hf_evaluator.evaluate_strings(prediction=\"I shall go\", reference=\"I will go\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Note: When it comes to semantic similarity, this often gives better results than older string distance metrics (such as those in the [StringDistanceEvalChain](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.string_distance.base.StringDistanceEvalChain.html#langchain.evaluation.string_distance.base.StringDistanceEvalChain)), though it tends to be less reliable than evaluators that use the LLM directly (such as the [QAEvalChain](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.qa.eval_chain.QAEvalChain.html#langchain.evaluation.qa.eval_chain.QAEvalChain) or [LabeledCriteriaEvalChain](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.criteria.eval_chain.LabeledCriteriaEvalChain.html#langchain.evaluation.criteria.eval_chain.LabeledCriteriaEvalChain)) " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.2" } }, "nbformat": 4, "nbformat_minor": 4 }