langchain/docs/extras/guides/evaluation/benchmarking_template.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a175c650",
   "metadata": {},
   "source": [
    "# Benchmarking Template\n",
    "\n",
    "This is an example notebook that can be used to create a benchmarking notebook for a task of your choice. Evaluation is really hard, and so we greatly welcome any contributions that can make it easier for people to experiment"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "984169ca",
   "metadata": {},
   "source": [
    "It is highly reccomended that you do any evaluation/benchmarking with tracing enabled. See [here](https://langchain.readthedocs.io/en/latest/tracing.html) for an explanation of what tracing is and how to set it up."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "9fe4d1b4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Comment this out if you are NOT using tracing\n",
    "import os\n",
    "\n",
    "os.environ[\"LANGCHAIN_HANDLER\"] = \"langchain\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f66405e",
   "metadata": {},
   "source": [
    "## Loading the data\n",
    "\n",
    "First, let's load the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79402a8f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# This notebook should so how to load the dataset from LangChainDatasets on Hugging Face\n",
    "\n",
    "# Please upload your dataset to https://huggingface.co/LangChainDatasets\n",
    "\n",
    "# The value passed into `load_dataset` should NOT have the `LangChainDatasets/` prefix\n",
    "from langchain.evaluation.loading import load_dataset\n",
    "\n",
    "dataset = load_dataset(\"TODO\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a16b75d",
   "metadata": {},
   "source": [
    "## Setting up a chain\n",
    "\n",
    "This next section should have an example of setting up a chain that can be run on this dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a2661ce0",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "6c0062e7",
   "metadata": {},
   "source": [
    "## Make a prediction\n",
    "\n",
    "First, we can make predictions one datapoint at a time. Doing it at this level of granularity allows use to explore the outputs in detail, and also is a lot cheaper than running over multiple datapoints"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d28c5e7d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Example of running the chain on a single datapoint (`dataset[0]`) goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0c16cd7",
   "metadata": {},
   "source": [
    "## Make many predictions\n",
    "Now we can make predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "24b4c66e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Example of running the chain on many predictions goes here\n",
    "\n",
    "# Sometimes its as simple as `chain.apply(dataset)`\n",
    "\n",
    "# Othertimes you may want to write a for loop to catch errors"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4783344b",
   "metadata": {},
   "source": [
    "## Evaluate performance\n",
    "\n",
    "Any guide to evaluating performance in a more systematic manner goes here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7710401a",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
Harrison/agent eval (#1620) Co-authored-by: jerwelborn <jeremy.welborn@gmail.com> 2023-03-14 19:37:48 +00:00			`{`
			`"cells": [`
			`{`
			`"cell_type": "markdown",`
			`"id": "a175c650",`
			`"metadata": {},`
			`"source": [`
			`"# Benchmarking Template\n",`
			`"\n",`
			`"This is an example notebook that can be used to create a benchmarking notebook for a task of your choice. Evaluation is really hard, and so we greatly welcome any contributions that can make it easier for people to experiment"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "984169ca",`
			`"metadata": {},`
			`"source": [`
			`"It is highly reccomended that you do any evaluation/benchmarking with tracing enabled. See [here](https://langchain.readthedocs.io/en/latest/tracing.html) for an explanation of what tracing is and how to set it up."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 28,`
			`"id": "9fe4d1b4",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"# Comment this out if you are NOT using tracing\n",`
			`"import os\n",`
Doc refactor (#6300) Co-authored-by: jacoblee93 <jacoblee93@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> 2023-06-16 18:52:56 +00:00			`"\n",`
Harrison/agent eval (#1620) Co-authored-by: jerwelborn <jeremy.welborn@gmail.com> 2023-03-14 19:37:48 +00:00			`"os.environ[\"LANGCHAIN_HANDLER\"] = \"langchain\""`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "0f66405e",`
			`"metadata": {},`
			`"source": [`
			`"## Loading the data\n",`
			`"\n",`
			`"First, let's load the data."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "79402a8f",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"# This notebook should so how to load the dataset from LangChainDatasets on Hugging Face\n",`
			`"\n",`
			`"# Please upload your dataset to https://huggingface.co/LangChainDatasets\n",`
			`"\n",`
			"# The value passed into `load_dataset` should NOT have the `LangChainDatasets/` prefix\n",
			`"from langchain.evaluation.loading import load_dataset\n",`
Doc refactor (#6300) Co-authored-by: jacoblee93 <jacoblee93@gmail.com> Co-authored-by: Harrison Chase <hw.chase.17@gmail.com> 2023-06-16 18:52:56 +00:00			`"\n",`
Harrison/agent eval (#1620) Co-authored-by: jerwelborn <jeremy.welborn@gmail.com> 2023-03-14 19:37:48 +00:00			`"dataset = load_dataset(\"TODO\")"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "8a16b75d",`
			`"metadata": {},`
			`"source": [`
			`"## Setting up a chain\n",`
			`"\n",`
			`"This next section should have an example of setting up a chain that can be run on this dataset."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "a2661ce0",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": []`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "6c0062e7",`
			`"metadata": {},`
			`"source": [`
			`"## Make a prediction\n",`
			`"\n",`
			`"First, we can make predictions one datapoint at a time. Doing it at this level of granularity allows use to explore the outputs in detail, and also is a lot cheaper than running over multiple datapoints"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 1,`
			`"id": "d28c5e7d",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			"# Example of running the chain on a single datapoint (`dataset[0]`) goes here"
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "d0c16cd7",`
			`"metadata": {},`
			`"source": [`
			`"## Make many predictions\n",`
			`"Now we can make predictions."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": 2,`
			`"id": "24b4c66e",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"# Example of running the chain on many predictions goes here\n",`
			`"\n",`
			"# Sometimes its as simple as `chain.apply(dataset)`\n",
			`"\n",`
			`"# Othertimes you may want to write a for loop to catch errors"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"id": "4783344b",`
			`"metadata": {},`
			`"source": [`
			`"## Evaluate performance\n",`
			`"\n",`
			`"Any guide to evaluating performance in a more systematic manner goes here."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"id": "7710401a",`
			`"metadata": {},`
			`"outputs": [],`
			`"source": []`
			`}`
			`],`
			`"metadata": {`
			`"kernelspec": {`
			`"display_name": "Python 3 (ipykernel)",`
			`"language": "python",`
			`"name": "python3"`
			`},`
			`"language_info": {`
			`"codemirror_mode": {`
			`"name": "ipython",`
			`"version": 3`
			`},`
			`"file_extension": ".py",`
			`"mimetype": "text/x-python",`
			`"name": "python",`
			`"nbconvert_exporter": "python",`
			`"pygments_lexer": "ipython3",`
			`"version": "3.9.1"`
			`}`
			`},`
			`"nbformat": 4,`
			`"nbformat_minor": 5`
			`}`