SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 341–350 of 439 papers

Title	Date	Tasks	Status	Hype
Token-Supervised Value Models for Enhancing Mathematical Reasoning Capabilities of Large Language Models	Jul 12, 2024	GSM8KMath	—Unverified	0
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist	Jul 11, 2024	GSM8KMath	—Unverified	0
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On	Jul 11, 2024	GSM8KMath	—Unverified	0
When is the consistent prediction likely to be a correct prediction?	Jul 8, 2024	GSM8KPrediction	—Unverified	0
Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks	Jul 4, 2024	GSM8KStrategyQA	—Unverified	0
metabench -- A Sparse Benchmark to Measure General Ability in Large Language Models	Jul 4, 2024	ARCGSM8K	CodeCode Available	0
AgentInstruct: Toward Generative Teaching with Agentic Flows	Jul 3, 2024	GSM8KMMLU	—Unverified	0
Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs	Jul 1, 2024	DiversityGSM8K	—Unverified	0
Advancing Process Verification for Large Language Models via Tree-Based Preference Learning	Jun 29, 2024	Binary ClassificationGSM8K	—Unverified	0
LiteSearch: Efficacious Tree Search for LLM	Jun 29, 2024	GSM8KMathematical Reasoning	—Unverified	0

Show:10 25 50

← PrevPage 35 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified