Self-Consistency Improves Chain of Thought Reasoning in Language Models

2022-03-21Code Available1· sign in to hype

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou

Code Available — Be the first to reproduce this paper.

Code

github.com/hughbzhang/o1_inference_scaling_laws
none★ 93
github.com/lastmile-ai/aiconfig/tree/main/cookbooks/Multi-LLM-Consistency
none★ 0
github.com/codelion/optillm/blob/main/optillm/self_consistency.py
pytorch★ 0

Abstract

Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks. In this paper, we propose a new decoding strategy, self-consistency, to replace the naive greedy decoding used in chain-of-thought prompting. It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer by marginalizing out the sampled reasoning paths. Self-consistency leverages the intuition that a complex reasoning problem typically admits multiple different ways of thinking leading to its unique correct answer. Our extensive empirical evaluation shows that self-consistency boosts the performance of chain-of-thought prompting with a striking margin on a range of popular arithmetic and commonsense reasoning benchmarks, including GSM8K (+17.9%), SVAMP (+11.0%), AQuA (+12.2%), StrategyQA (+6.4%) and ARC-challenge (+3.9%).

Tasks

ARC Arithmetic Reasoning GSM8K Language Modelling Math StrategyQA

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
GSM8K	PaLM 540B maj1@40 (8-shot)	Accuracy	74.4	—	Unverified

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Code

Abstract

Tasks

Benchmark Results

Reproductions