Prompt-Engineering-Guide/pages/techniques/multimodalcot.zh.mdx
2023-03-30 19:14:59 -06:00

15 lines
907 B
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 多模态思维链提示方法
import { Callout, FileTree } from 'nextra-theme-docs'
import {Screenshot} from 'components/screenshot'
import MCOT from '../../img/multimodal-cot.png'
最近,[Zhang等人2023](https://arxiv.org/abs/2302.00923)提出了一种多模态思维链提示方法。传统的思维链提示方法侧重于语言模态。相比之下,多模态思维链提示将文本和视觉融入到一个两阶段框架中。第一步涉及基于多模态信息的理性生成。接下来是第二阶段的答案推断,它利用生成的理性信息。
多模态CoT模型1B在ScienceQA基准测试中的表现优于GPT-3.5。
<Screenshot src={MCOT} alt="MCOT" />
图片来源:[Zhang et al. (2023)](https://arxiv.org/abs/2302.00923)
进一步阅读:
- [语言不是你所需要的全部:将感知与语言模型对齐](https://arxiv.org/abs/2302.14045)2023年2月