SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–210 of 439 papers

Title	Date	Tasks	Status	Hype	Score
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks	Oct 14, 2024	FairnessGSM8K	CodeCode Available	0	5
Inference Scaling vs Reasoning: An Empirical Analysis of Compute-Optimal LLM Problem-Solving	Dec 20, 2024	Computational EfficiencyGSM8K	CodeCode Available	0	5
In-Context Principle Learning from Mistakes	Feb 8, 2024	GSM8KIn-Context Learning	CodeCode Available	0	5
LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity	Oct 4, 2024	DiversityEnsemble Pruning	CodeCode Available	0	5
A mixed policy to improve performance of language models on math problems	Jul 17, 2023	GSM8KMath	CodeCode Available	0	5
How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective	Oct 14, 2024	Density Ratio EstimationGSM8K	CodeCode Available	0	5
DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression	Jul 16, 2025	GSM8K	CodeCode Available	0	5
GKT: A Novel Guidance-Based Knowledge Transfer Framework For Efficient Cloud-edge Collaboration LLM Deployment	May 30, 2024	GSM8KKnowledge Distillation	CodeCode Available	0	5
MathScale: Scaling Instruction Tuning for Mathematical Reasoning	Mar 5, 2024	GSM8KMath	CodeCode Available	0	5
AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting	May 24, 2025	GSM8KReinforcement Learning (RL)	CodeCode Available	0	5

Show:10 25 50

← PrevPage 21 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified