langchain/docs/extras/guides/evaluation
Zander Chase cc60fed3be
Add a Pairwise Comparison Chain (#6703)
Notebook shows preference scoring between two chains and reports wilson
score interval + p value

I think I'll add the option to insert ground truth labels but doesn't
have to be in this PR
2023-06-26 20:47:41 -07:00
..
agent_benchmarking.ipynb nit (#6305) 2023-06-16 16:21:27 -07:00
agent_vectordb_sota_pg.ipynb Docs nit (#6350) 2023-06-18 20:58:12 -07:00
benchmarking_template.ipynb Doc refactor (#6300) 2023-06-16 11:52:56 -07:00
comparisons.ipynb Add a Pairwise Comparison Chain (#6703) 2023-06-26 20:47:41 -07:00
criteria_eval_chain.ipynb Update String Evaluator (#6615) 2023-06-26 14:16:14 -07:00
data_augmented_question_answering.ipynb Doc refactor (#6300) 2023-06-16 11:52:56 -07:00
generic_agent_evaluation.ipynb Doc refactor (#6300) 2023-06-16 11:52:56 -07:00
huggingface_datasets.ipynb Doc refactor (#6300) 2023-06-16 11:52:56 -07:00
index.mdx fix eval guide links (#6319) 2023-06-16 17:53:46 -07:00
llm_math.ipynb Doc refactor (#6300) 2023-06-16 11:52:56 -07:00
openapi_eval.ipynb docs/fix links (#6498) 2023-06-20 14:06:50 -07:00
qa_benchmarking_pg.ipynb Doc refactor (#6300) 2023-06-16 11:52:56 -07:00
qa_benchmarking_sota.ipynb Doc refactor (#6300) 2023-06-16 11:52:56 -07:00
qa_generation.ipynb Doc refactor (#6300) 2023-06-16 11:52:56 -07:00
question_answering.ipynb Doc refactor (#6300) 2023-06-16 11:52:56 -07:00
sql_qa_benchmarking_chinook.ipynb Doc refactor (#6300) 2023-06-16 11:52:56 -07:00