benchmarks | ||
executors | ||
generators | ||
lazzzy@404c06a5bf | ||
media | ||
root | ||
scratch | ||
.gitignore | ||
.gitmodules | ||
2023-04-12_23-12-20.jsonl | ||
2023-04-12_23-22-43.jsonl | ||
2023-04-12_23-36-32.jsonl | ||
2023-04-12_23-54-35.jsonl | ||
2023-04-13_00-00-57.jsonl | ||
2023-04-13_21-02-29.jsonl | ||
generate_dataset.py | ||
LICENSE | ||
main.py | ||
plot.py | ||
README.md | ||
reflexion_mbpp_py_logs | ||
reflexion_mbpp_py_logs2 | ||
reflexion_ucs.py | ||
reflexion.py | ||
requirements.txt | ||
run_reflexion_py_leet.sh | ||
run_reflexion_ucs.sh | ||
run_reflexion.sh | ||
run_simple_py_leet.sh | ||
run_simple.sh | ||
run_testacc.sh | ||
simple_mbpp_py2_logs | ||
simple_mbpp_py_logs | ||
simple.py | ||
test_acc.py | ||
utils.py | ||
validate_py_results.py | ||
validate_rs_results.py |
Mastering HumanEval with Reflexion
This is a spin-off project inspired by the paper: Reflexion: an autonomous agent with dynamic memory and self-reflection. Noah Shinn, Beck Labash, Ashwin Gopinath. Preprint, 2023
Read more about this project in this post.
Check out an interesting type-inference implementation here: OpenTau
Check out the code for the original paper here
If you have any questions, please contact noahshinn024@gmail.com
Cloning The Repository
The repository contains git submodules. To clone the repo with the submodules, run:
git clone --recurse-submodules
Note
Due to the nature of these experiments, it may not be feasible for individual developers to rerun the results due to limited access to GPT-4 and significant API charges. Due to recent requests, both trials have been rerun once more and are dumped in ./root
with a script here to validate the solutions with the unit tests provided by HumanEval.
To run the validation on your log files or the provided log files:
python ./validate_py_results.py <path to jsonlines file>
Warning
Please do not run the Reflexion agent in an unsecure environment as the generated code is not validated before execution.
Cite
Note: This is a spin-off implementation that implements a relaxation on the internal success criteria proposed in the original paper.
@article{shinn2023reflexion,
title={Reflexion: an autonomous agent with dynamic memory and self-reflection},
author={Shinn, Noah and Labash, Beck and Gopinath, Ashwin},
journal={arXiv preprint arXiv:2303.11366},
year={2023}
}