SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–160 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Matrix Information Theory for Self-Supervised Learning	May 27, 2023	Contrastive LearningGSM8K	CodeCode Available	1	5
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving	Feb 27, 2025	GSM8KMath	CodeCode Available	1	5
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates	Feb 28, 2024	GSM8KSafety Alignment	CodeCode Available	1	5
Learning Goal-Conditioned Representations for Language Reward Models	Jul 18, 2024	GSM8KMath	CodeCode Available	1	5
SMART: Self-Aware Agent for Tool Overuse Mitigation	Feb 17, 2025	GSM8KLarge Language Model	CodeCode Available	1	5
LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Feedback	Jun 20, 2024	Binary ClassificationGSM8K	CodeCode Available	1	5
Re-Initialization Token Learning for Tool-Augmented Large Language Models	Jun 17, 2025	GSM8KQuestion Answering	CodeCode Available	0	5
Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems	Oct 3, 2023	GSM8KMath	CodeCode Available	0	5
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement	Oct 12, 2024	Code GenerationComputational Efficiency	CodeCode Available	0	5
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models	Apr 3, 2025	GSM8KReinforcement Learning (RL)	CodeCode Available	0	5

Show:10 25 50

← PrevPage 16 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified