SOTAVerified

GSM8K

Papers

Showing 171180 of 439 papers

TitleStatusHype
PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models0
Learning a Continue-Thinking Token for Enhanced Test-Time ScalingCode0
Slimming Down LLMs Without Losing Their Minds0
Unsupervised Elicitation of Language Models0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search0
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation0
Text-to-LoRA: Instant Transformer AdaptionCode0
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers0
Model Unlearning via Sparse Autoencoder Subspace Guided Projections0
Evaluation of LLMs for mathematical problem solving0
Show:102550
← PrevPage 18 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1XolverAccuracy98.1Unverified
2Orange-mini0-shot MRR98Unverified