# String Distance

One of the simplest ways to compare an LLM or chain's string output against a reference label is by using string distance measurements such as Levenshtein or postfix distance. This can be used alongside approximate/fuzzy matching criteria for very basic unit testing.

This can be accessed using the `string_distance` evaluator, which uses distance metric's from the [rapidfuzz](https://github.com/maxbachmann/RapidFuzz) library.

**Note:** The returned scores are _distances_, meaning lower is typically "better".

For more information, check out the reference docs for the [StringDistanceEvalChain](https://api.python.langchain.com/en/latest/evaluation/langchain.evaluation.string_distance.base.StringDistanceEvalChain.html#langchain.evaluation.string_distance.base.StringDistanceEvalChain) for more info.

In [1]:
# %pip install rapidfuzz

In [2]:
from langchain.evaluation import load_evaluator

evaluator = load_evaluator("string_distance")

In [3]:
evaluator.evaluate_strings(
 prediction="The job is completely done.",
 reference="The job is done",
)

{'score': 12}

In [4]:
# The results purely character-based, so it's less useful when negation is concerned
evaluator.evaluate_strings(
 prediction="The job is done.",
 reference="The job isn't done",
)

{'score': 4}

## Configure the String Distance Metric

By default, the `StringDistanceEvalChain` uses levenshtein distance, but it also supports other string distance algorithms. Configure using the `distance` argument.

In [5]:
from langchain.evaluation import StringDistance

list(StringDistance)

[,
 ,
 ,
 ]

In [6]:
jaro_evaluator = load_evaluator(
 "string_distance", distance=StringDistance.JARO, requires_reference=True
)

In [7]:
jaro_evaluator.evaluate_strings(
 prediction="The job is completely done.",
 reference="The job is done",
)

{'score': 0.19259259259259254}

In [8]:
jaro_evaluator.evaluate_strings(
 prediction="The job is done.",
 reference="The job isn't done",
)

{'score': 0.12083333333333324}