SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 141–150 of 439 papers

Title	Date	Tasks	Status	Hype
Recursive Decomposition of Logical Thoughts: Framework for Superior Reasoning and Knowledge Propagation in Large Language Models	Jan 3, 2025	GSM8KMath	—Unverified	0
DIVE: Diversified Iterative Self-Improvement	Jan 1, 2025	DiversityGSM8K	CodeCode Available	0
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs	Dec 30, 2024	GSM8K	—Unverified	0
LLM2: Let Large Language Models Harness System 2 Reasoning	Dec 29, 2024	GSM8KMathematical Reasoning	CodeCode Available	0
Natural Language Fine-Tuning	Dec 29, 2024	GSM8KLarge Language Model	CodeCode Available	2
Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning	Dec 23, 2024	Arithmetic ReasoningGSM8K	—Unverified	0
Ask-Before-Detection: Identifying and Mitigating Conformity Bias in LLM-Powered Error Detector for Math Word Problem Solutions	Dec 22, 2024	GSM8KMath	—Unverified	0
System-2 Mathematical Reasoning via Enriched Instruction Tuning	Dec 22, 2024	ERPGSM8K	—Unverified	0
Inference Scaling vs Reasoning: An Empirical Analysis of Compute-Optimal LLM Problem-Solving	Dec 20, 2024	Computational EfficiencyGSM8K	CodeCode Available	0
Offline Reinforcement Learning for LLM Multi-Step Reasoning	Dec 20, 2024	GSM8KMath	CodeCode Available	2

Show:10 25 50

← PrevPage 15 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified