mirror of
https://github.com/dair-ai/Prompt-Engineering-Guide
synced 2024-11-02 15:40:13 +00:00
Merge pull request #198 from dave1010/patch-1
Add Tree-of-Thought Prompting
This commit is contained in:
commit
54c9025a5e
@ -31,3 +31,13 @@ Code available [here](https://github.com/princeton-nlp/tree-of-thought-llm) and
|
|||||||
|
|
||||||
At a high level, the main ideas of [Yao et el. (2023)](https://arxiv.org/abs/2305.10601) and [Long (2023)](https://arxiv.org/abs/2305.08291) are similar. Both enhance LLM's capability for complex problem solving through tree search via a multi-round conversation. One of the main difference is that [Yao et el. (2023)](https://arxiv.org/abs/2305.10601) leverages DFS/BFS/beam search, while the tree search strategy (i.e. when to backtrack and backtracking by how many levels, etc.) proposed in [Long (2023)](https://arxiv.org/abs/2305.08291) is driven by a "ToT Controller" trained through reinforcement learning. DFS/BFS/Beam search are generic solution search strategies with no adaptation to specific problems. In comparison, a ToT Controller trained through RL might be able learn from new data set or through self-play (AlphaGo vs brute force search), and hence the RL-based ToT system can continue to evolve and learn new knowledge even with a fixed LLM.
|
At a high level, the main ideas of [Yao et el. (2023)](https://arxiv.org/abs/2305.10601) and [Long (2023)](https://arxiv.org/abs/2305.08291) are similar. Both enhance LLM's capability for complex problem solving through tree search via a multi-round conversation. One of the main difference is that [Yao et el. (2023)](https://arxiv.org/abs/2305.10601) leverages DFS/BFS/beam search, while the tree search strategy (i.e. when to backtrack and backtracking by how many levels, etc.) proposed in [Long (2023)](https://arxiv.org/abs/2305.08291) is driven by a "ToT Controller" trained through reinforcement learning. DFS/BFS/Beam search are generic solution search strategies with no adaptation to specific problems. In comparison, a ToT Controller trained through RL might be able learn from new data set or through self-play (AlphaGo vs brute force search), and hence the RL-based ToT system can continue to evolve and learn new knowledge even with a fixed LLM.
|
||||||
|
|
||||||
|
[Hulbert (2023)](https://github.com/dave1010/tree-of-thought-prompting) has proposed Tree-of-Thought Prompting, which applies the main concept from ToT frameworks as a simple prompting technique, getting the LLM to evaluate intermediate thoughts in a single prompt. A sample ToT prompt is:
|
||||||
|
|
||||||
|
```
|
||||||
|
Imagine three different experts are answering this question.
|
||||||
|
All experts will write down 1 step of their thinking,
|
||||||
|
then share it with the group.
|
||||||
|
Then all experts will go on to the next step, etc.
|
||||||
|
If any expert realises they're wrong at any point then they leave.
|
||||||
|
The question is...
|
||||||
|
```
|
||||||
|
Loading…
Reference in New Issue
Block a user