SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 391–400 of 439 papers

Title	Date	Tasks	Status	Hype
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning	May 23, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	0
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks	Oct 14, 2024	FairnessGSM8K	CodeCode Available	0
SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving	Oct 19, 2023	GSM8KMath	CodeCode Available	0
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning	May 22, 2025	GSM8KMath	CodeCode Available	0
Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning	Oct 16, 2024	AllGSM8K	CodeCode Available	0
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability	Mar 4, 2025	GSM8KLogical Reasoning	CodeCode Available	0
NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models	Feb 20, 2025	GSM8KNatural Language Understanding	CodeCode Available	0
A mixed policy to improve performance of language models on math problems	Jul 17, 2023	GSM8KMath	CodeCode Available	0
Enhancing Knowledge Distillation for LLMs with Response-Priming Prompting	Dec 18, 2024	GSM8KKnowledge Distillation	CodeCode Available	0
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need	Jun 18, 2025	GSM8KHumanEval	CodeCode Available	0

Show:10 25 50

← PrevPage 40 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified