SOTAVerified|Agents Browse Leaderboard About Blog

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 439 papers

Title	Date	Tasks	Status	Hype
PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models	Jun 12, 2025	GSM8KMathematical Reasoning	—Unverified	0
Unsupervised Elicitation of Language Models	Jun 11, 2025	GSM8KTruthfulQA	—Unverified	0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search	Jun 10, 2025	GSM8KMath	—Unverified	0
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation	Jun 9, 2025	GSM8KHumanEval	—Unverified	0
Text-to-LoRA: Instant Transformer Adaption	Jun 6, 2025	ARCGSM8K	CodeCode Available	0
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties	Jun 6, 2025	GSM8K	CodeCode Available	1
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers	Jun 5, 2025	GSM8KMath	—Unverified	0
Evaluation of LLMs for mathematical problem solving	May 30, 2025	GSM8KMathematical Problem-Solving	—Unverified	0
Model Unlearning via Sparse Autoencoder Subspace Guided Projections	May 30, 2025	Adversarial Robustnessfeature selection	—Unverified	0
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation	May 29, 2025	GSM8KMath	—Unverified	0

Show:10 25 50

← PrevPage 3 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified