reflexion-human-eval/README.md

# Mastering HumanEval with Reflexion

This is a spin-off project inspired by the paper: [Reflexion: an autonomous agent with dynamic memory and self-reflection. Noah Shinn, Beck Labash, Ashwin Gopinath. _Preprint_, 2023](https://arxiv.org/abs/2303.11366)

Read more about this project in this [post](https://nanothoughts.substack.com/p/reflecting-on-reflexion)

Check out an interesting type-inference implementation here: [OpenTau](https://github.com/GammaTauAI/opentau)

Check out the code for the original paper [here](https://github.com/noahshinn024/reflexion)

Check out a new superhuman programming agent gym [here](https://github.com/GammaTauAI/leetcode-hard-gym)

### Note

This repo contains scratch code that was used for testing. The real version of Reflexion for benchmark-agnostic, language-agnostic code generation will be released after the first version of the upcoming paper to respect the privacy of the work (and collaboration) in progress.

If you have any questions, please contact [noahshinn024@gmail.com](noahshinn024@gmail.com)

![architecture](./media/architecture.png)

![result](./media/performance.png)

### Another Note

Due to the nature of these experiments, it may not be feasible for individual developers to rerun the results due to limited access to GPT-4 and significant API charges. Due to recent requests, both trials have been rerun once more and are dumped in `./root` with a script [here](https://github.com/noahshinn024/reflexion-human-eval/blob/main/validate_py_results.py) to validate the solutions with the unit tests provided by [HumanEval](https://github.com/openai/human-eval).

To run the validation on your log files or the provided log files:
```bash
python ./validate_py_results.py <path to jsonlines file>
```

### Warning

Please do not run the Reflexion agent in an unsecure environment as the generated code is not validated before execution.

### Cite

**Note**: This is a spin-off implementation that implements a relaxation on the internal success criteria proposed in the [original paper](https://arxiv.org/abs/2303.11366).

```bibtex
@article{shinn2023reflexion,
  title={Reflexion: an autonomous agent with dynamic memory and self-reflection},
  author={Shinn, Noah and Labash, Beck and Gopinath, Ashwin},
  journal={arXiv preprint arXiv:2303.11366},
  year={2023}
}
```
images 2023-03-22 16:22:37 +00:00			`# Mastering HumanEval with Reflexion`

reamde 2023-03-23 20:08:08 +00:00			`This is a spin-off project inspired by the paper: [Reflexion: an autonomous agent with dynamic memory and self-reflection. Noah Shinn, Beck Labash, Ashwin Gopinath. _Preprint_, 2023](https://arxiv.org/abs/2303.11366)`
images 2023-03-22 16:22:37 +00:00
leetcode-hard gym repo 2023-03-30 06:45:50 +00:00			`Read more about this project in this [post](https://nanothoughts.substack.com/p/reflecting-on-reflexion)`
link in readme 2023-03-24 19:57:11 +00:00
rerun and prep for rust impl 2023-03-26 18:34:49 +00:00			`Check out an interesting type-inference implementation here: [OpenTau](https://github.com/GammaTauAI/opentau)`

			`Check out the code for the original paper [here](https://github.com/noahshinn024/reflexion)`

update leetcode hard gym link 2023-03-30 06:52:42 +00:00			`Check out a new superhuman programming agent gym [here](https://github.com/GammaTauAI/leetcode-hard-gym)`
leetcode-hard gym repo 2023-03-30 06:45:50 +00:00
note about paper 2023-04-09 06:18:52 +00:00			`### Note`

			`This repo contains scratch code that was used for testing. The real version of Reflexion for benchmark-agnostic, language-agnostic code generation will be released after the first version of the upcoming paper to respect the privacy of the work (and collaboration) in progress.`

rerun and prep for rust impl 2023-03-26 18:34:49 +00:00			`If you have any questions, please contact [noahshinn024@gmail.com](noahshinn024@gmail.com)`

images 2023-03-22 16:22:37 +00:00			`![architecture](./media/architecture.png)`

			`![result](./media/performance.png)`
rerun and prep for rust impl 2023-03-26 18:34:49 +00:00
note about paper 2023-04-09 06:18:52 +00:00			`### Another Note`
rerun and prep for rust impl 2023-03-26 18:34:49 +00:00
			Due to the nature of these experiments, it may not be feasible for individual developers to rerun the results due to limited access to GPT-4 and significant API charges. Due to recent requests, both trials have been rerun once more and are dumped in `./root` with a script [here](https://github.com/noahshinn024/reflexion-human-eval/blob/main/validate_py_results.py) to validate the solutions with the unit tests provided by [HumanEval](https://github.com/openai/human-eval).

			`To run the validation on your log files or the provided log files:`
			```bash
			`python ./validate_py_results.py <path to jsonlines file>`
			```

			`### Warning`

			`Please do not run the Reflexion agent in an unsecure environment as the generated code is not validated before execution.`

			`### Cite`

			`Note: This is a spin-off implementation that implements a relaxation on the internal success criteria proposed in the [original paper](https://arxiv.org/abs/2303.11366).`

			```bibtex
			`@article{shinn2023reflexion,`
			`title={Reflexion: an autonomous agent with dynamic memory and self-reflection},`
			`author={Shinn, Noah and Labash, Beck and Gopinath, Ashwin},`
			`journal={arXiv preprint arXiv:2303.11366},`
			`year={2023}`
			`}`
			```