You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Prompt-Engineering-Guide/pages/techniques/multimodalcot.zh.mdx

15 lines
907 B
Markdown

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# 多模态思维链提示方法
import { Callout, FileTree } from 'nextra-theme-docs'
import {Screenshot} from 'components/screenshot'
import MCOT from '../../img/multimodal-cot.png'
最近,[Zhang等人2023](https://arxiv.org/abs/2302.00923)提出了一种多模态思维链提示方法。传统的思维链提示方法侧重于语言模态。相比之下,多模态思维链提示将文本和视觉融入到一个两阶段框架中。第一步涉及基于多模态信息的理性生成。接下来是第二阶段的答案推断,它利用生成的理性信息。
多模态CoT模型1B在ScienceQA基准测试中的表现优于GPT-3.5。
<Screenshot src={MCOT} alt="MCOT" />
图片来源:[Zhang et al. (2023)](https://arxiv.org/abs/2302.00923)
进一步阅读:
- [语言不是你所需要的全部:将感知与语言模型对齐](https://arxiv.org/abs/2302.14045)2023年2月