This repo holds the code, demos, and logs for the Reflexion paper (v2 not out yet): [Reflexion: Language Agents with Verbal Reinforcement Learning. Noah Shinn, Federico Cassano, Beck Labash, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao. _Preprint_, 2023](https://arxiv.org/abs/2303.11366)
We provide a set of notebooks to easily run, explore, and interact with the results of the reasoning experiments. Each experiment consists of a random sample of 100 questions from the HotPotQA distractor dataset. Each question in the sample is attempted by an agent with a specific type and reflexion strategy.
#### Setup
To get started:
1. Clone this repo and move to the HotPotQA directory:
2. Install the module dependencies into your environment:
```bash
pip install -r requirements.txt
```
3. Set `OPENAI_API_KEY` environment variable to your OpenAI API key:
```bash
export OPENAI_API_KEY=<yourkey>
```
#### Agent Types
Agent type is determined by the notebook you choose to run. The available agent types include:
-`ReAct` - ReAct Agent
-`CoT_context` - CoT Agent given supporting context about the question
-`CoT_no_context` - CoT Agent given no supporting context about the question
The notebook for each agent type is located in the `./hotpot_runs/notebooks` directory.
#### Reflexion Strategies
Each notebook allows you to specify the reflexion strategy to be used by the agents. The available reflexion strategies, which are defined in an `Enum`, include:
-`ReflexionStrategy.NONE` - The agent is not given any information about its last attempt.
-`ReflexionStrategy.LAST_ATTEMPT` - The agent is given its reasoning trace from its last attempt on the question as context.
-`ReflexionStrategy.REFLEXION` - The agent is given its self-reflection on the last attempt as context.
-`ReflexionStrategy.LAST_ATTEMPT_AND_REFLEXION` - The agent is given both its reasoning trace and self-reflection on the last attempt as context.
Due to the nature of these experiments, it may not be feasible for individual developers to rerun the results as GPT-4 has limited access and significant API charges. All runs from the paper and additional results are logged in `./alfworld_runs/root` for decision-making and `./hotpotqa_runs/root` for reasoning.