start run instructions

1 year ago · 34ab94a3b3
parent 5942b44c41
commit 34ab94a3b3
3 changed files with 40 additions and 8 deletions
--- a/README.md
+++ b/README.md
@ -1,20 +1,49 @@
 # Reflexion: Language Agents with Verbal Reinforcement Learning

-![Reflexion RL diagram](./figures/reflexion_rl.png)
+This repo holds the code, demos, and logs for the Reflexion paper (v2 not out yet): [Reflexion: Language Agents with Verbal Reinforcement Learning. Noah Shinn, Federico Cassano, Beck Labash, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao. _Preprint_, 2023](https://arxiv.org/abs/2303.11366)

-This repo holds the code, demos, and logs for: [Reflexion: Language Agents with Verbal Reinforcement Learning. Noah Shinn, Federico Cassano, Beck Labash, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao. _Preprint_, 2023](https://arxiv.org/abs/2303.11366)
+![Reflexion RL diagram](./figures/reflexion_rl.png)

 ![Reflexion tasks](./figures/reflexion_tasks.png)

 We release the LeetcodeHardGym [here](https://github.com/GammaTauAI/leetcode-hard-gym)

-### Another Note
+### Note
+`decision-making`: `./alfworld_runs` and `./webshop_runs`
+`programming`: v2 not released yet, to be cleaned soon
+`reasoning`: `./hotpotqa_runs`

-Due to the nature of these experiments, it may not be feasible for individual developers to rerun the results as GPT-4 has limited access and significant API charges. All runs from the paper and additional results are logged in `./programming_runs/root` for programming, `./alfworld_runs/root` for decision-making, and `./hotpotqa_runs/root` for reasoning. Programming runs can be validated with scripts [here](https://github.com/noahshinn024/reflexion/blob/main/programming/validate_py_results.py) and [here](https://github.com/noahshinn024/reflexion/blob/main/programming/validate_rs_results.py) to validate the Python and Rust solutions with the unit tests provided by their respective benchmarks.
+### To Run: decision-making (AlfWorld)
+Clone this repo and move to the AlfWorld directory
+```bash
+git clone https://github.com/noahshinn024/reflexion && cd ./alfworld_runs
+```
+
+Specify the run parameters in `./run_reflexion.sh`.
+`num_trials`: number of iterative learning steps
+`num_envs`: number of task-environment pairs per trial
+`run_name`: the name for this run
+`use_memory`: use persisting memory to store self-reflections (turn off to run a baseline run)
+`is_resume`: use logging directory to resume a previous run
+`resume_dir`: the logging directory from which to resume the previous run
+`start_trial_num`: if resume run, then the trial number of which to start

-### Warning
+Run the trial
+```bash
+./run_reflexion.sh
+```
+
+The logs will be sent to `./root/<run_name>`.
+
+### To Run: reasoning (HotPotQA)
+Clone this repo and move to the AlfWorld directory
+```bash
+git clone https://github.com/noahshinn024/reflexion && cd ./hotpotqa_runs
+```
+
+### Another Note

-Please do not run the Reflexion programming agent in an unsecure environment as the generated code is not validated before execution.
+Due to the nature of these experiments, it may not be feasible for individual developers to rerun the results as GPT-4 has limited access and significant API charges. All runs from the paper and additional results are logged in `./alfworld_runs/root` for decision-making and `./hotpotqa_runs/root` for reasoning. 

 ### Other Notes

@ -24,7 +53,7 @@ Read the original blog [here](https://nanothoughts.substack.com/p/reflecting-on-

 Check out an interesting type-inference implementation here: [OpenTau](https://github.com/GammaTauAI/opentau)

-If you have any questions, contact [noahshinn024@gmail.com](noahshinn024@gmail.com)
+For all questions, contact [noahshinn024@gmail.com](noahshinn024@gmail.com)

 ### Cite

--- a/programming_runs/README.md
+++ b/programming_runs/README.md
@ -0,0 +1,3 @@
+# Programming runs
+
+##### Reflexion programming v2 is not released yet but will be available in a few days after the code is cleaned up
--- a/programming_runs/run_reflexion.sh
+++ b/programming_runs/run_reflexion.sh
@ -6,5 +6,5 @@ python main.py \
  --language "py" \
  --model "gpt-4" \
  --pass_at_k "1" \
-  --max_iters "5" \
+  --max_iters "2" \
  --verbose