langchain/docs/extras/guides/evaluation/benchmarking_template.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "a175c650",
   "metadata": {},
   "source": [
    "# Benchmarking Template\n",
    "\n",
    "This is an example notebook that can be used to create a benchmarking notebook for a task of your choice. Evaluation is really hard, and so we greatly welcome any contributions that can make it easier for people to experiment"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "984169ca",
   "metadata": {},
   "source": [
    "It is highly reccomended that you do any evaluation/benchmarking with tracing enabled. See [here](https://langchain.readthedocs.io/en/latest/tracing.html) for an explanation of what tracing is and how to set it up."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "9fe4d1b4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Comment this out if you are NOT using tracing\n",
    "import os\n",
    "\n",
    "os.environ[\"LANGCHAIN_HANDLER\"] = \"langchain\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f66405e",
   "metadata": {},
   "source": [
    "## Loading the data\n",
    "\n",
    "First, let's load the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "79402a8f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# This notebook should so how to load the dataset from LangChainDatasets on Hugging Face\n",
    "\n",
    "# Please upload your dataset to https://huggingface.co/LangChainDatasets\n",
    "\n",
    "# The value passed into `load_dataset` should NOT have the `LangChainDatasets/` prefix\n",
    "from langchain.evaluation.loading import load_dataset\n",
    "\n",
    "dataset = load_dataset(\"TODO\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a16b75d",
   "metadata": {},
   "source": [
    "## Setting up a chain\n",
    "\n",
    "This next section should have an example of setting up a chain that can be run on this dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a2661ce0",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "6c0062e7",
   "metadata": {},
   "source": [
    "## Make a prediction\n",
    "\n",
    "First, we can make predictions one datapoint at a time. Doing it at this level of granularity allows use to explore the outputs in detail, and also is a lot cheaper than running over multiple datapoints"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d28c5e7d",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Example of running the chain on a single datapoint (`dataset[0]`) goes here"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0c16cd7",
   "metadata": {},
   "source": [
    "## Make many predictions\n",
    "Now we can make predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "24b4c66e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Example of running the chain on many predictions goes here\n",
    "\n",
    "# Sometimes its as simple as `chain.apply(dataset)`\n",
    "\n",
    "# Othertimes you may want to write a for loop to catch errors"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4783344b",
   "metadata": {},
   "source": [
    "## Evaluate performance\n",
    "\n",
    "Any guide to evaluating performance in a more systematic manner goes here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7710401a",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}