SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–160 of 439 papers

Title	Date	Tasks	Status	Hype
Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement	May 23, 2023	GSM8K	CodeCode Available	1
Solving Math Word Problems by Combining Language Models With Symbolic Solvers	Apr 16, 2023	GSM8KLanguage Modeling	CodeCode Available	1
Boosted Prompt Ensembles for Large Language Models	Apr 12, 2023	GSM8KLanguage Modeling	CodeCode Available	1
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning	Jan 27, 2023	Few-Shot LearningGSM8K	CodeCode Available	1
Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions	May 28, 2022	Arithmetic ReasoningEfficient Exploration	CodeCode Available	1
Self-Consistency Improves Chain of Thought Reasoning in Language Models	Mar 21, 2022	ARCArithmetic Reasoning	CodeCode Available	1
GEMMAS: Graph-based Evaluation Metrics for Multi Agent Systems	Jul 17, 2025	DiversityGSM8K	—Unverified	0
DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression	Jul 16, 2025	GSM8K	CodeCode Available	0
KisMATH: Do LLMs Have Knowledge of Implicit Structures in Mathematical Reasoning?	Jul 15, 2025	GSM8KLanguage Modeling	—Unverified	0
CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs	Jul 8, 2025	GSM8KMath	—Unverified	0

Show:10 25 50

← PrevPage 16 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified