Prompt-Engineering-Guide/pages/techniques/multimodalcot.kr.mdx

# Multimodal CoT Prompting

import { Callout, FileTree } from 'nextra-theme-docs'
import {Screenshot} from 'components/screenshot'
import MCOT from '../../img/multimodal-cot.png'

[Zhang et al. (2023)](https://arxiv.org/abs/2302.00923)은 최근 멀티모달 CoT 프롬프트 접근 방식을 제안했습니다. 기존의 CoT는 언어 양식에 중점을 둡니다. 반면, 멀티모달 CoT는 텍스트와 이미지를 2단계 프레임워크에 통합합니다. 첫 번째 단계에서는 멀티모달 정보를 기반으로 근거를 생성합니다. 그 다음에는 두 번째 단계인 답변 추론이 이어지며, 이 단계에서는 생성된 정보를 활용하여 답변을 도출합니다.

멀티모달 CoT 모델(1B)은 ScienceQA 벤치마크에서 GPT-3.5보다 성능이 뛰어났습니다.

<Screenshot src={MCOT} alt="MCOT" />
Image Source: [Zhang et al. (2023)](https://arxiv.org/abs/2302.00923)

더 읽어볼 것:
- [Language Is Not All You Need: Aligning Perception with Language Models](https://arxiv.org/abs/2302.14045) (Feb 2023)
Completed of draft translation of 'techniques' chapter. 2023-04-12 16:12:31 +00:00			`# Multimodal CoT Prompting`

			`import { Callout, FileTree } from 'nextra-theme-docs'`
			`import {Screenshot} from 'components/screenshot'`
			`import MCOT from '../../img/multimodal-cot.png'`

			[Zhang et al. (2023)](https://arxiv.org/abs/2302.00923)은 최근 멀티모달 CoT 프롬프트 접근 방식을 제안했습니다. 기존의 CoT는 언어 양식에 중점을 둡니다. 반면, 멀티모달 CoT는 텍스트와 이미지를 2단계 프레임워크에 통합합니다. 첫 번째 단계에서는 멀티모달 정보를 기반으로 근거를 생성합니다. 그 다음에는 두 번째 단계인 답변 추론이 이어지며, 이 단계에서는 생성된 정보를 활용하여 답변을 도출합니다.

			`멀티모달 CoT 모델(1B)은 ScienceQA 벤치마크에서 GPT-3.5보다 성능이 뛰어났습니다.`

			`<Screenshot src={MCOT} alt="MCOT" />`
			`Image Source: [Zhang et al. (2023)](https://arxiv.org/abs/2302.00923)`

			`더 읽어볼 것:`
			`- [Language Is Not All You Need: Aligning Perception with Language Models](https://arxiv.org/abs/2302.14045) (Feb 2023)`