Prompt-Engineering-Guide/notebooks/pe-lecture.ipynb

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Getting Started with Prompt Engineering [WIP]\n",
    "by DAIR.AI | Elvis Saravia\n",
    "\n",
    "\n",
    "This notebook contains examples and exercises to learning about prompt engineering.\n",
    "\n",
    "We will be using the [OpenAI APIs](https://platform.openai.com/) for all examples. I am using the default settings `temperature=0.7` and `top-p=1`"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Prompt Engineering Basics\n",
    "\n",
    "Objectives\n",
    "- Load the libraries\n",
    "- Review the format\n",
    "- Cover basic prompts\n",
    "- Review common use cases"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below we are loading the necessary libraries, utilities, and configurations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import openai\n",
    "import os\n",
    "import IPython\n",
    "from langchain.llms import OpenAI\n",
    "from dotenv import load_dotenv\n",
    "load_dotenv()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Make sure to replace `OPENAI_KEY` with your own key."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# API configuration\n",
    "openai.api_key = os.getenv(\"OPENAI_KEY\")\n",
    "\n",
    "# for LangChain\n",
    "os.environ[\"OPENAI_API_KEY\"] = os.getenv(\"OPENAI_KEY\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "def set_open_params(\n",
    "    model=\"text-davinci-003\",\n",
    "    temperature=0.7,\n",
    "    max_tokens=256,\n",
    "    top_p=1,\n",
    "    frequency_penalty=0,\n",
    "    presence_penalty=0,\n",
    "):\n",
    "    \"\"\" set openai parameters\"\"\"\n",
    "\n",
    "    openai_params = {}    \n",
    "\n",
    "    openai_params['model'] = model\n",
    "    openai_params['temperature'] = temperature\n",
    "    openai_params['max_tokens'] = max_tokens\n",
    "    openai_params['top_p'] = top_p\n",
    "    openai_params['frequency_penalty'] = frequency_penalty\n",
    "    openai_params['presence_penalty'] = presence_penalty\n",
    "    return openai_params\n",
    "\n",
    "def get_completion(params, prompt):\n",
    "    \"\"\" GET completion from openai api\"\"\"\n",
    "\n",
    "    response = openai.Completion.create(\n",
    "        engine = params['model'],\n",
    "        prompt = prompt,\n",
    "        temperature = params['temperature'],\n",
    "        max_tokens = params['max_tokens'],\n",
    "        top_p = params['top_p'],\n",
    "        frequency_penalty = params['frequency_penalty'],\n",
    "        presence_penalty = params['presence_penalty'],\n",
    "    )\n",
    "    return response"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Basic prompt example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "# basic example\n",
    "params = set_open_params()\n",
    "\n",
    "prompt = \"The sky is\"\n",
    "\n",
    "response = get_completion(params, prompt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\" blue\\n\\nThe sky is blue during the day as a result of the way the Earth's atmosphere scatters sunlight. The blue color of the sky is due to Rayleigh scattering, which is the scattering of light by molecules in the air. When sunlight passes through the atmosphere, shorter wavelength colors like blue are scattered more widely than longer wavelength colors like red, making the sky appear blue.\""
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "response.choices[0].text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       " blue\n",
       "\n",
       "The sky is blue during the day as a result of the way the Earth's atmosphere scatters sunlight. The blue color of the sky is due to Rayleigh scattering, which is the scattering of light by molecules in the air. When sunlight passes through the atmosphere, shorter wavelength colors like blue are scattered more widely than longer wavelength colors like red, making the sky appear blue."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "IPython.display.Markdown(response.choices[0].text)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Try with different temperature to compare results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       " blue\n",
       "\n",
       "The sky is blue because of the way the atmosphere scatters sunlight. When sunlight passes through the atmosphere, the blue wavelengths are scattered more than the other colors, making the sky appear blue."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "params = set_open_params(temperature=0)\n",
    "prompt = \"The sky is\"\n",
    "response = get_completion(params, prompt)\n",
    "IPython.display.Markdown(response.choices[0].text)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.1 Text Summarization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "\n",
       "Antibiotics are medications used to treat bacterial infections by killing or inhibiting the growth of bacteria, but they are not effective against viral infections and should be used appropriately to avoid antibiotic resistance."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "params = set_open_params(temperature=0.7)\n",
    "prompt = \"\"\"Antibiotics are a type of medication used to treat bacterial infections. They work by either killing the bacteria or preventing them from reproducing, allowing the body's immune system to fight off the infection. Antibiotics are usually taken orally in the form of pills, capsules, or liquid solutions, or sometimes administered intravenously. They are not effective against viral infections, and using them inappropriately can lead to antibiotic resistance. \n",
    "\n",
    "Explain the above in one sentence:\"\"\"\n",
    "\n",
    "response = get_completion(params, prompt)\n",
    "IPython.display.Markdown(response.choices[0].text)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Exercise: Instruct the model to explain the paragraph in one sentence like \"I am 5\". Do you see any differences?"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.2 Question Answering"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       " Mice."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = \"\"\"Answer the question based on the context below. Keep the answer short and concise. Respond \"Unsure about answer\" if not sure about the answer.\n",
    "\n",
    "Context: Teplizumab traces its roots to a New Jersey drug company called Ortho Pharmaceutical. There, scientists generated an early version of the antibody, dubbed OKT3. Originally sourced from mice, the molecule was able to bind to the surface of T cells and limit their cell-killing potential. In 1986, it was approved to help prevent organ rejection after kidney transplants, making it the first therapeutic antibody allowed for human use.\n",
    "\n",
    "Question: What was OKT3 originally sourced from?\n",
    "\n",
    "Answer:\"\"\"\n",
    "\n",
    "response = get_completion(params, prompt)\n",
    "IPython.display.Markdown(response.choices[0].text)\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Context obtained from here: https://www.nature.com/articles/d41586-023-00400-x"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Exercise: Edit prompt and get the model to respond that it isn't sure about the answer. "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.3 Text Classification"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       " Neutral"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = \"\"\"Classify the text into neutral, negative or positive.\n",
    "\n",
    "Text: I think the food was okay.\n",
    "\n",
    "Sentiment:\"\"\"\n",
    "\n",
    "response = get_completion(params, prompt)\n",
    "IPython.display.Markdown(response.choices[0].text)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Exercise: Modify the prompt to instruct the model to provide an explanation to the answer selected. "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.4 Role Playing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       " Sure. A black hole is a region of spacetime where gravity is so strong that nothing, not even light, can escape its pull. Black holes are formed when a massive star runs out of fuel and collapses under its own gravity. The collapse causes the star to become so dense that it creates an event horizon, which is the point of no return."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = \"\"\"The following is a conversation with an AI research assistant. The assistant tone is technical and scientific.\n",
    "\n",
    "Human: Hello, who are you?\n",
    "AI: Greeting! I am an AI research assistant. How can I help you today?\n",
    "Human: Can you tell me about the creation of blackholes?\n",
    "AI:\"\"\"\n",
    "\n",
    "response = get_completion(params, prompt)\n",
    "IPython.display.Markdown(response.choices[0].text)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Exercise: Modify the prompt to instruct the model to keep AI responses concise and short."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.5 Code Generation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "\n",
       "SELECT StudentId, StudentName \n",
       "FROM students \n",
       "WHERE DepartmentId = (SELECT DepartmentId \n",
       "                      FROM departments \n",
       "                      WHERE DepartmentName = 'Computer Science')"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = \"\\\"\\\"\\\"\\nTable departments, columns = [DepartmentId, DepartmentName]\\nTable students, columns = [DepartmentId, StudentId, StudentName]\\nCreate a MySQL query for all students in the Computer Science Department\\n\\\"\\\"\\\"\"\n",
    "\n",
    "response = get_completion(params, prompt)\n",
    "IPython.display.Markdown(response.choices[0].text)\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.6 Reasoning"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "\n",
       "\n",
       "Odd numbers: 15, 5, 13, 7, 1\n",
       "Total = 41\n",
       "41 is an odd number."
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "prompt = \"\"\"The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1. \n",
    "\n",
    "Solve by breaking the problem into steps. First, identify the odd numbers, add them, and indicate whether the result is odd or even.\"\"\"\n",
    "\n",
    "response = get_completion(params, prompt)\n",
    "IPython.display.Markdown(response.choices[0].text)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Exercise: Improve the prompt to have a better structure and output format."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Advanced Prompting Techniques\n",
    "\n",
    "Objectives:\n",
    "\n",
    "- Cover more advanced techniques for prompting: few-shot, chain-of-thoughts,...\n",
    "- Review more advanced applications"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3. Tools and Applications\n",
    "\n",
    "Objective:\n",
    "\n",
    "- Demonstrate how to use LangChain to develop a simple application leveraging the PAL prompting technique"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are developing a simple application that's able to reason about the question being asked through code. \n",
    "\n",
    "Specifically, the application takes in some data and answers a question about the data input. The prompt includes a few exemplars which are adopted from [here](https://github.com/reasoning-machines/pal/blob/main/pal/prompt/penguin_prompt.py).  "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.1 PAL"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# lm instance\n",
    "llm = OpenAI(model_name='text-davinci-003', temperature=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "question = \"Which is the oldest penguin?\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "PENGUIN_PROMPT = '''\n",
    "\"\"\"\n",
    "Q: Here is a table where the first line is a header and each subsequent line is a penguin:\n",
    "name, age, height (cm), weight (kg) \n",
    "Louis, 7, 50, 11\n",
    "Bernard, 5, 80, 13\n",
    "Vincent, 9, 60, 11\n",
    "Gwen, 8, 70, 15\n",
    "For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm. \n",
    "We now add a penguin to the table:\n",
    "James, 12, 90, 12\n",
    "How many penguins are less than 8 years old?\n",
    "\"\"\"\n",
    "# Put the penguins into a list.\n",
    "penguins = []\n",
    "penguins.append(('Louis', 7, 50, 11))\n",
    "penguins.append(('Bernard', 5, 80, 13))\n",
    "penguins.append(('Vincent', 9, 60, 11))\n",
    "penguins.append(('Gwen', 8, 70, 15))\n",
    "# Add penguin James.\n",
    "penguins.append(('James', 12, 90, 12))\n",
    "# Find penguins under 8 years old.\n",
    "penguins_under_8_years_old = [penguin for penguin in penguins if penguin[1] < 8]\n",
    "# Count number of penguins under 8.\n",
    "num_penguin_under_8 = len(penguins_under_8_years_old)\n",
    "answer = num_penguin_under_8\n",
    "\"\"\"\n",
    "Q: Here is a table where the first line is a header and each subsequent line is a penguin:\n",
    "name, age, height (cm), weight (kg) \n",
    "Louis, 7, 50, 11\n",
    "Bernard, 5, 80, 13\n",
    "Vincent, 9, 60, 11\n",
    "Gwen, 8, 70, 15\n",
    "For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm.\n",
    "Which is the youngest penguin?\n",
    "\"\"\"\n",
    "# Put the penguins into a list.\n",
    "penguins = []\n",
    "penguins.append(('Louis', 7, 50, 11))\n",
    "penguins.append(('Bernard', 5, 80, 13))\n",
    "penguins.append(('Vincent', 9, 60, 11))\n",
    "penguins.append(('Gwen', 8, 70, 15))\n",
    "# Sort the penguins by age.\n",
    "penguins = sorted(penguins, key=lambda x: x[1])\n",
    "# Get the youngest penguin's name.\n",
    "youngest_penguin_name = penguins[0][0]\n",
    "answer = youngest_penguin_name\n",
    "\"\"\"\n",
    "Q: Here is a table where the first line is a header and each subsequent line is a penguin:\n",
    "name, age, height (cm), weight (kg) \n",
    "Louis, 7, 50, 11\n",
    "Bernard, 5, 80, 13\n",
    "Vincent, 9, 60, 11\n",
    "Gwen, 8, 70, 15\n",
    "For example: the age of Louis is 7, the weight of Gwen is 15 kg, the height of Bernard is 80 cm.\n",
    "What is the name of the second penguin sorted by alphabetic order?\n",
    "\"\"\"\n",
    "# Put the penguins into a list.\n",
    "penguins = []\n",
    "penguins.append(('Louis', 7, 50, 11))\n",
    "penguins.append(('Bernard', 5, 80, 13))\n",
    "penguins.append(('Vincent', 9, 60, 11))\n",
    "penguins.append(('Gwen', 8, 70, 15))\n",
    "# Sort penguins by alphabetic order.\n",
    "penguins_alphabetic = sorted(penguins, key=lambda x: x[0])\n",
    "# Get the second penguin sorted by alphabetic order.\n",
    "second_penguin_name = penguins_alphabetic[1][0]\n",
    "answer = second_penguin_name\n",
    "\"\"\"\n",
    "{question}\n",
    "\"\"\"\n",
    "'''.strip() + '\\n'"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have the prompt and question. We can send it to the model. It should output the steps, in code, needed to get the solution to the answer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "# Put the penguins into a list.\n",
      "penguins = []\n",
      "penguins.append(('Louis', 7, 50, 11))\n",
      "penguins.append(('Bernard', 5, 80, 13))\n",
      "penguins.append(('Vincent', 9, 60, 11))\n",
      "penguins.append(('Gwen', 8, 70, 15))\n",
      "# Sort the penguins by age.\n",
      "penguins = sorted(penguins, key=lambda x: x[1], reverse=True)\n",
      "# Get the oldest penguin's name.\n",
      "oldest_penguin_name = penguins[0][0]\n",
      "answer = oldest_penguin_name\n"
     ]
    }
   ],
   "source": [
    "llm_out = llm(PENGUIN_PROMPT.format(question=question))\n",
    "print(llm_out)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Vincent\n"
     ]
    }
   ],
   "source": [
    "exec(llm_out)\n",
    "print(answer)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's the correct answer! Vincent is the oldest penguin. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Exercise: Try a different question and see what's the result."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3.2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "promptlecture",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "f38e0373277d6f71ee44ee8fea5f1d408ad6999fda15d538a69a99a1665a839d"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}