Go to file
2023-05-21 15:52:27 +02:00
alfworld_runs start v2 2023-05-21 15:48:05 +02:00
benchmarks rs hardest 50 results 2023-04-18 17:45:36 -04:00
figures start v2 2023-05-21 15:52:27 +02:00
programming_runs reinit submodules 2023-05-21 15:51:39 +02:00
webshop_runs alfworld and webshop 2023-05-21 15:35:36 +02:00
.gitignore Leetcode Hard: Python3 Benchmark 2023-04-06 01:39:31 -04:00
.gitmodules reinit submodules 2023-05-21 15:51:39 +02:00
LICENSE Initial commit 2023-03-22 02:38:53 -04:00
README.md start v2 2023-05-21 15:52:27 +02:00

Reflexion: Language Agents with Verbal Reinforcement Learning

Reflexion RL diagram

This repo holds the code, demos, and logs for: Reflexion: Language Agents with Verbal Reinforcement Learning. Noah Shinn, Federico Cassano, Beck Labash, Ashwin Gopinath, Karthik Narasimhan, Shunyu Yao. Preprint, 2023

Reflexion tasks

We release the LeetcodeHardGym here

Another Note

Due to the nature of these experiments, it may not be feasible for individual developers to rerun the results as GPT-4 has limited access and significant API charges. All runs from the paper and additional results are logged in ./programming_runs/root for programming, ./alfworld_runs/root for decision-making, and ./hotpotqa_runs/root for reasoning. Programming runs can be validated with scripts here and here to validate the Python and Rust solutions with the unit tests provided by their respective benchmarks.

Warning

Please do not run the Reflexion programming agent in an unsecure environment as the generated code is not validated before execution.

Other Notes

Check out the code for the original draft here

Read the original blog here

Check out an interesting type-inference implementation here: OpenTau

If you have any questions, contact noahshinn024@gmail.com

Cite

@article{shinn2023reflexion,
  title={Reflexion: an autonomous agent with dynamic memory and self-reflection},
  author={Shinn, Noah and Labash, Beck and Gopinath, Ashwin},
  journal={arXiv preprint arXiv:2303.11366},
  year={2023}
}