multimodal CoT

pull/22/head
Elvis Saravia 1 year ago
parent db5414fbd6
commit 91495493ed

@ -89,6 +89,7 @@ The following are the latest papers (sorted by release date) on prompt engineeri
- [Large Language Models are Zero-Shot Reasoners](https://arxiv.org/abs/2205.11916) (May 2022)
- [MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning](https://arxiv.org/abs/2205.00445) (May 2022)
- [Toxicity Detection with Generative Prompt-based Inference](https://arxiv.org/abs/2205.12390) (May 2022)
- [Learning to Transfer Prompts for Text Generation](https://arxiv.org/abs/2205.01543) (May 2022)
- [The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning](https://arxiv.org/abs/2205.03401) (May 2022)
- [A Taxonomy of Prompt Modifiers for Text-To-Image Generation](https://arxiv.org/abs/2204.13988) (Apr 2022)
- [PromptChainer: Chaining Large Language Model Prompts through Visual Programming](https://arxiv.org/abs/2203.06566) (Mar 2022)

@ -7,7 +7,7 @@ In this section, we discuss other miscellaneous but important topics in prompt e
Topic:
- [Program-Aided Language Models](#program-aided-language-models)
- [ReAct](#react)
- [Multimodal Prompting](#multimodal-prompting)
- [Multimodal CoT Prompting](#multimodal-prompting)
- [GraphPrompts](#graphprompts)
---
@ -30,10 +30,13 @@ The ReAct framework can allow LLMs to interact with external tools to retrieve a
Full example coming soon!
---
## Multimodal Prompting
In this section, we will cover some examples of multimodal prompting techniques and applications that leverage multiple modalities as opposed to just text alone.
## Multimodal CoT Prompting
Examples coming soon!
[Zhang et al. (2023)](https://arxiv.org/abs/2302.00923) recently proposed a multimodal chain-of-thought prompting approach. Traditional CoT focuses on the language modality. In contrast, Multimodal CoT incorporates text and vision into a two-stage framework. The first step involves rationale generation based on multimodal information. This is followed by the second phase, answer inference, which leverages the informative generated rationales.
The multimodal CoT model (1B) outperforms GPT-3.5 on the ScienceQA benchmark.
![](../img/multimodal-cot.png)
---
## GraphPrompts

Binary file not shown.

After

Width:  |  Height:  |  Size: 171 KiB

Loading…
Cancel
Save