SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 191–200 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems	May 24, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	0	5
Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation	Jun 20, 2024	GSM8KLanguage Model Evaluation	CodeCode Available	0	5
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability	Mar 4, 2025	GSM8KLogical Reasoning	CodeCode Available	0	5
Inference Scaling vs Reasoning: An Empirical Analysis of Compute-Optimal LLM Problem-Solving	Dec 20, 2024	Computational EfficiencyGSM8K	CodeCode Available	0	5
In-Context Principle Learning from Mistakes	Feb 8, 2024	GSM8KIn-Context Learning	CodeCode Available	0	5
A mixed policy to improve performance of language models on math problems	Jul 17, 2023	GSM8KMath	CodeCode Available	0	5
How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective	Oct 14, 2024	Density Ratio EstimationGSM8K	CodeCode Available	0	5
DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression	Jul 16, 2025	GSM8K	CodeCode Available	0	5
NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models	Feb 20, 2025	GSM8KNatural Language Understanding	CodeCode Available	0	5
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges	Feb 12, 2025	GSM8KMath	CodeCode Available	0	5

Show:10 25 50

← PrevPage 20 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified