The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task

2023-11-15Unverified0· sign in to hype

Yifan Wu, Pengchuan Zhang, Wenhan Xiong, Barlas Oguz, James C. Gee, Yixin Nie

Unverified — Be the first to reproduce this paper.

Abstract

The study explores the effectiveness of the Chain-of-Thought approach, known for its proficiency in language tasks by breaking them down into sub-tasks and intermediate steps, in improving vision-language tasks that demand sophisticated perception and reasoning. We present the "Description then Decision" strategy, which is inspired by how humans process signals. This strategy significantly improves probing task performance by 50%, establishing the groundwork for future research on reasoning paradigms in complex vision-language tasks.

Tasks

Visual Reasoning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Winoground	GPT-4V (CoT, pick b/w two options)	Text Score	75.25	—	Unverified
Winoground	GPT-4V (pick b/w two options)	Text Score	69.25	—	Unverified

The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task

Abstract

Tasks

Benchmark Results

Reproductions