SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 111–120 of 439 papers

Title	Date	Tasks	Status	Hype
Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent	Sep 17, 2024	GSM8KQuestion Answering	CodeCode Available	1
SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models	Aug 21, 2024	8kGSM8K	CodeCode Available	1
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational Curricula	Aug 8, 2024	GSM8KLanguage Modeling	CodeCode Available	1
Learning Goal-Conditioned Representations for Language Reward Models	Jul 18, 2024	GSM8KMath	CodeCode Available	1
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning	Jul 4, 2024	AvgGSM8K	CodeCode Available	1
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning	Jun 30, 2024	GSM8KMath	CodeCode Available	1
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback	Jun 20, 2024	Binary ClassificationGSM8K	CodeCode Available	1
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive Principles	Jun 18, 2024	Arithmetic ReasoningCode Generation	CodeCode Available	1
DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling	Jun 17, 2024	GSM8KMath	CodeCode Available	1
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast	May 23, 2024	Computational EfficiencyGSM8K	CodeCode Available	1

Show:10 25 50

← PrevPage 12 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified