reflexion-human-eval/README.md

48 lines
2.3 KiB
Markdown
Raw Normal View History

2023-03-22 16:22:37 +00:00
# Mastering HumanEval with Reflexion
2023-03-23 20:08:08 +00:00
This is a spin-off project inspired by the paper: [Reflexion: an autonomous agent with dynamic memory and self-reflection. Noah Shinn, Beck Labash, Ashwin Gopinath. _Preprint_, 2023](https://arxiv.org/abs/2303.11366)
2023-03-22 16:22:37 +00:00
2023-03-30 06:45:50 +00:00
Read more about this project in this [post](https://nanothoughts.substack.com/p/reflecting-on-reflexion)
2023-03-24 19:57:11 +00:00
2023-03-26 18:34:49 +00:00
Check out an interesting type-inference implementation here: [OpenTau](https://github.com/GammaTauAI/opentau)
Check out the code for the original paper [here](https://github.com/noahshinn024/reflexion)
2023-03-30 06:52:42 +00:00
Check out a new superhuman programming agent gym [here](https://github.com/GammaTauAI/leetcode-hard-gym)
2023-03-30 06:45:50 +00:00
2023-04-09 06:18:52 +00:00
### Note
This repo contains scratch code that was used for testing. The real version of Reflexion for benchmark-agnostic, language-agnostic code generation will be released after the first version of the upcoming paper to respect the privacy of the work (and collaboration) in progress.
2023-03-26 18:34:49 +00:00
If you have any questions, please contact [noahshinn024@gmail.com](noahshinn024@gmail.com)
2023-03-22 16:22:37 +00:00
![architecture](./media/architecture.png)
![result](./media/performance.png)
2023-03-26 18:34:49 +00:00
2023-04-09 06:18:52 +00:00
### Another Note
2023-03-26 18:34:49 +00:00
Due to the nature of these experiments, it may not be feasible for individual developers to rerun the results due to limited access to GPT-4 and significant API charges. Due to recent requests, both trials have been rerun once more and are dumped in `./root` with a script [here](https://github.com/noahshinn024/reflexion-human-eval/blob/main/validate_py_results.py) to validate the solutions with the unit tests provided by [HumanEval](https://github.com/openai/human-eval).
To run the validation on your log files or the provided log files:
```bash
python ./validate_py_results.py <path to jsonlines file>
```
### Warning
Please do not run the Reflexion agent in an unsecure environment as the generated code is not validated before execution.
### Cite
**Note**: This is a spin-off implementation that implements a relaxation on the internal success criteria proposed in the [original paper](https://arxiv.org/abs/2303.11366).
```bibtex
@article{shinn2023reflexion,
title={Reflexion: an autonomous agent with dynamic memory and self-reflection},
author={Shinn, Noah and Labash, Beck and Gopinath, Ashwin},
journal={arXiv preprint arXiv:2303.11366},
year={2023}
}
```