From fb62f2be7002369d0e16fd347138a2423fcf7d8f Mon Sep 17 00:00:00 2001 From: Fielding Johnston Date: Sun, 23 Jul 2023 20:25:14 -0500 Subject: [PATCH] nit: small typo in evaluation module docs (#8155) Hopefully, this doesn't come across as nitpicky! That isn't the intention. I only noticed it, because I enjoy reading the documentation and when I hit a mental road bump it is usually due to a missing word or something =) @baskaryan --- docs/docs_skeleton/docs/modules/evaluation/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/docs_skeleton/docs/modules/evaluation/index.mdx b/docs/docs_skeleton/docs/modules/evaluation/index.mdx index 87971b66fd..51c1a94942 100644 --- a/docs/docs_skeleton/docs/modules/evaluation/index.mdx +++ b/docs/docs_skeleton/docs/modules/evaluation/index.mdx @@ -6,7 +6,7 @@ import DocCardList from "@theme/DocCardList"; # Evaluation -Language models can be unpredictable. This makes it challenging to ship reliable applications to production, where repeatable, useful outcomes across diverse inputs are a minimum requirement. Tests help demonstrate each component in an LLM application can produce the required or expected functionality. These tests also safeguard against regressions while you improve interconnected pieces of an integrated system. However, measuring the quality of generated text can be challenging. It can be hard to agree on the right set of metrics for your application, and it can be difficult to translate those into better performance. Furthermore, it's common to lack sufficient evaluation data adequately test the range of inputs and expected outputs for each component when you're just getting started. The LangChain community is building open source tools and guides to help address these challenges. +Language models can be unpredictable. This makes it challenging to ship reliable applications to production, where repeatable, useful outcomes across diverse inputs are a minimum requirement. Tests help demonstrate each component in an LLM application can produce the required or expected functionality. These tests also safeguard against regressions while you improve interconnected pieces of an integrated system. However, measuring the quality of generated text can be challenging. It can be hard to agree on the right set of metrics for your application, and it can be difficult to translate those into better performance. Furthermore, it's common to lack sufficient evaluation data to adequately test the range of inputs and expected outputs for each component when you're just getting started. The LangChain community is building open source tools and guides to help address these challenges. LangChain exposes different types of evaluators for common types of evaluation. Each type has off-the-shelf implementations you can use to get started, as well as an extensible API so you can create your own or contribute improvements for everyone to use. The following sections have example notebooks for you to get started.