SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 171–180 of 439 papers

Title	Date	Tasks	Status	Hype
PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models	Jun 12, 2025	GSM8KMathematical Reasoning	—Unverified	0
Learning a Continue-Thinking Token for Enhanced Test-Time Scaling	Jun 12, 2025	GSM8KMath	CodeCode Available	0
Slimming Down LLMs Without Losing Their Minds	Jun 12, 2025	Computational EfficiencyGSM8K	—Unverified	0
Unsupervised Elicitation of Language Models	Jun 11, 2025	GSM8KTruthfulQA	—Unverified	0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search	Jun 10, 2025	GSM8KMath	—Unverified	0
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation	Jun 9, 2025	GSM8KHumanEval	—Unverified	0
Text-to-LoRA: Instant Transformer Adaption	Jun 6, 2025	ARCGSM8K	CodeCode Available	0
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers	Jun 5, 2025	GSM8KMath	—Unverified	0
Model Unlearning via Sparse Autoencoder Subspace Guided Projections	May 30, 2025	Adversarial Robustnessfeature selection	—Unverified	0
Evaluation of LLMs for mathematical problem solving	May 30, 2025	GSM8KMathematical Problem-Solving	—Unverified	0

Show:10 25 50

← PrevPage 18 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified