e9407a6725 | 1 year ago | |
---|---|---|
benchmarks | 2 years ago | |
executors | 2 years ago | |
generators | 2 years ago | |
lazzzy@404c06a5bf | 2 years ago | |
media | 2 years ago | |
root | 1 year ago | |
scratch | 2 years ago | |
.gitignore | 2 years ago | |
.gitmodules | 2 years ago | |
LICENSE | 2 years ago | |
README.md | 2 years ago | |
generate_dataset.py | 2 years ago | |
main.py | 2 years ago | |
plot.py | 2 years ago | |
reflexion.py | 2 years ago | |
reflexion_mbpp_py_logs | 2 years ago | |
reflexion_mbpp_py_logs2 | 2 years ago | |
reflexion_ucs.py | 2 years ago | |
requirements.txt | 2 years ago | |
run_reflexion.sh | 2 years ago | |
run_reflexion_ucs.sh | 2 years ago | |
run_simple.sh | 2 years ago | |
run_testacc.sh | 1 year ago | |
simple.py | 2 years ago | |
simple_mbpp_py2_logs | 2 years ago | |
simple_mbpp_py_logs | 2 years ago | |
test_acc.py | 2 years ago | |
utils.py | 2 years ago | |
validate_py_results.py | 2 years ago | |
validate_rs_results.py | 2 years ago |
README.md
Mastering HumanEval with Reflexion
This is a spin-off project inspired by the paper: Reflexion: an autonomous agent with dynamic memory and self-reflection. Noah Shinn, Beck Labash, Ashwin Gopinath. Preprint, 2023
Read more about this project in this post.
Check out an interesting type-inference implementation here: OpenTau
Check out the code for the original paper here
If you have any questions, please contact noahshinn024@gmail.com
Cloning The Repository
The repository contains git submodules. To clone the repo with the submodules, run:
git clone --recurse-submodules
Note
Due to the nature of these experiments, it may not be feasible for individual developers to rerun the results due to limited access to GPT-4 and significant API charges. Due to recent requests, both trials have been rerun once more and are dumped in ./root
with a script here to validate the solutions with the unit tests provided by HumanEval.
To run the validation on your log files or the provided log files:
python ./validate_py_results.py <path to jsonlines file>
Warning
Please do not run the Reflexion agent in an unsecure environment as the generated code is not validated before execution.
Cite
Note: This is a spin-off implementation that implements a relaxation on the internal success criteria proposed in the original paper.
@article{shinn2023reflexion,
title={Reflexion: an autonomous agent with dynamic memory and self-reflection},
author={Shinn, Noah and Labash, Beck and Gopinath, Ashwin},
journal={arXiv preprint arXiv:2303.11366},
year={2023}
}