OpenAI Cookbook

The OpenAI Cookbook shares example code for accomplishing common tasks with the OpenAI API.

To run these examples, you'll need an OpenAI account and API key (create a free account).

Most code examples are written in Python, though the concepts can be applied in any language.

Recently added/updated 🆕 ✨

Related resources from around the web [May 22, 2023]
Embeddings playground (streamlit app) [May 19, 2023]
How to use a multi-step prompt to write unit tests [May 19, 2023]
How to create dynamic masks with DALL·E and Segment Anything [May 19, 2023]
Question answering using embeddings [Apr 14, 2023]

Guides & examples

Beyond the code examples here, you can learn about the OpenAI API from the following resources:

Experiment with ChatGPT
Try the API in the OpenAI Playground
Read about the API in the OpenAI Documentation
Get help in the OpenAI Help Center
Discuss the API in the OpenAI Community Forum or OpenAI Discord channel
See example prompts in the OpenAI Examples
Stay updated with the OpenAI Blog

People are writing great tools and papers for improving outputs from GPT. Here are some cool ones we've seen:

Prompting libraries & tools

Guidance: A handy looking Python library from Microsoft that uses Handlebars templating to interleave generation, prompting, and logical control.
LangChain: A popular Python/JavaScript library for chaining sequences of language model prompts.
FLAML (A Fast Library for Automated Machine Learning & Tuning): A Python library for automating selection of models, hyperparameters, and other tunable choices.
Chainlit: A Python library for making chatbot interfaces.
Guardrails.ai: A Python library for validating outputs and retrying failures. Still in alpha, so expect sharp edges and bugs.
Semantic Kernel: A Python/C# library from Microsoft that supports prompt templating, function chaining, vectorized memory, and intelligent planning.
Outlines: A Python library that provides a domain-specific language to simplify prompting and constrain generation.
Promptify: A small Python library for using language models to perform NLP tasks.
Scale Spellbook: A paid product for building, comparing, and shipping language model apps.
PromptPerfect: A paid product for testing and improving prompts.
Weights & Biases: A paid product for tracking model training and prompt engineering experiments.
OpenAI Evals: An open-source library for evaluating task performance of language models and prompts.
LlamaIndex: A Python library for augmenting LLM apps with data.
Arthur Shield: A paid product for detecting toxicity, hallucination, prompt injection, etc.

Prompting guides

Brex's Prompt Engineering Guide: Brex's introduction to language models and prompt engineering.
promptingguide.ai: A prompt engineering guide that demonstrates many techniques.
OpenAI Cookbook: Techniques to improve reliability: A slightly dated (Sep 2022) review of techniques for prompting language models.
Lil'Log Prompt Engineering: An OpenAI researcher's review of the prompt engineering literature (as of March 2023).
learnprompting.org: An introductory course to prompt engineering.

Video courses

Andrew Ng's DeepLearning.AI: A short course on prompt engineering for developers.
Andrej Karpathy's Let's build GPT: A detailed dive into the machine learning underlying GPT.
Prompt Engineering by DAIR.AI: A one-hour video on various prompt engineering techniques.

Papers on advanced prompting to improve reasoning

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2022): Using few-shot prompts to ask models to think step by step improves their reasoning. PaLM's score on math word problems (GSM8K) rises from 18% to 57%.
Self-Consistency Improves Chain of Thought Reasoning in Language Models (2022): Taking votes from multiple outputs improves accuracy even more. Voting across 40 outputs raises PaLM's score on math word problems further, from 57% to 74%, and code-davinci-002's from 60% to 78%.
Tree of Thoughts: Deliberate Problem Solving with Large Language Models (2023): Searching over trees of step by step reasoning helps even more than voting over chains of thought. It lifts GPT-4's scores on creative writing and crosswords.
Language Models are Zero-Shot Reasoners (2022): Telling instruction-following models to think step by step improves their reasoning. It lifts text-davinci-002's score on math word problems (GSM8K) from 13% to 41%.
Large Language Models Are Human-Level Prompt Engineers (2023): Automated searching over possible prompts found a prompt that lifts scores on math word problems (GSM8K) to 43%, 2 percentage points above the human-written prompt in Language Models are Zero-Shot Reasoners.
Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling (2023): Automated searching over possible chain-of-thought prompts improved ChatGPT's scores on a few benchmarks by 0–20 percentage points.
Faithful Reasoning Using Large Language Models (2022): Reasoning can be improved by a system that combines: chains of thought generated by alternative selection and inference prompts, a halter model that chooses when to halt selection-inference loops, a value function to search over multiple reasoning paths, and sentence labels that help avoid hallucination.
STaR: Bootstrapping Reasoning With Reasoning (2022): Chain of thought reasoning can be baked into models via fine-tuning. For tasks with an answer key, example chains of thoughts can be generated by language models.
ReAct: Synergizing Reasoning and Acting in Language Models (2023): For tasks with tools or an environment, chain of thought works better you prescriptively alternate between Reasoning steps (thinking about what to do) and Acting (getting information from a tool or environment).
Reflexion: an autonomous agent with dynamic memory and self-reflection (2023): Retrying tasks with memory of prior failures improves subsequent performance.
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP (2023): Models augmented with knowledge via a "retrieve-then-read" can be improved with multi-hop chains of searches.
Improving Factuality and Reasoning in Language Models through Multiagent Debate (2023): Generating debates between a few ChatGPT agents over a few rounds improves scores on various benchmarks. Math word problem scores rise from 77% to 85%.

Contributing

If there are examples or guides you'd like to see, feel free to suggest them on the issues page. We are also happy to accept high quality pull requests, as long as they fit the scope of the repo.

12 KiB Raw Blame History Unescape Escape