SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 211–220 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting	Feb 5, 2025	GSM8KMath	CodeCode Available	0	5
VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation	Jun 25, 2024	ARCBenchmarking	CodeCode Available	0	5
Iterative Reasoning Preference Optimization	Apr 30, 2024	ARCGSM8K	—Unverified	0	0
Efficient Contextual LLM Cascades through Budget-Constrained Policy Learning	Apr 17, 2024	GSM8KNavigate	—Unverified	0	0
MALT: Improving Reasoning with Multi-Agent LLM Training	Dec 2, 2024	Common Sense ReasoningGSM8K	—Unverified	0	0
MAmmoTH2: Scaling Instructions from the Web	May 6, 2024	ChatbotGSM8K	—Unverified	0	0
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist	Jul 11, 2024	GSM8KMath	—Unverified	0	0
Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs	Jan 21, 2025	GSM8KIn-Context Learning	—Unverified	0	0
Interpretable Math Word Problem Solution Generation Via Step-by-step Planning	Jun 1, 2023	GSM8KLanguage Modeling	—Unverified	0	0
MathAttack: Attacking Large Language Models Towards Math Solving Ability	Sep 4, 2023	Adversarial AttackGSM8K	—Unverified	0	0

Show:10 25 50

← PrevPage 22 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified