SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 371–380 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation	Sep 5, 2024	GSM8K	—Unverified	0	0
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation	May 29, 2025	GSM8KMath	—Unverified	0	0
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning	Sep 10, 2024	GSM8KMixture-of-Experts	—Unverified	0	0
Subtle Errors Matter: Preference Learning via Error-injected Self-editing	Oct 9, 2024	GSM8KMath	—Unverified	0	0
Building Math Agents with Multi-Turn Iterative Preference Learning	Sep 4, 2024	GSM8KMath	—Unverified	0	0
BrainTransformers: SNN-LLM	Oct 3, 2024	ARCGSM8K	—Unverified	0	0
Supervised Optimism Correction: Be Confident When LLMs Are Sure	Apr 10, 2025	GSM8KMath	—Unverified	0	0
Supervisory Prompt Training	Mar 26, 2024	GSM8KSentence	—Unverified	0	0
Sustainable LLM Inference for Edge AI: Evaluating Quantized LLMs for Energy Efficiency, Output Accuracy, and Inference Latency	Apr 4, 2025	BenchmarkingGSM8K	—Unverified	0	0
SymBa: Symbolic Backward Chaining for Structured Natural Language Reasoning	Feb 20, 2024	Arithmetic ReasoningGSM8K	—Unverified	0	0

Show:10 25 50

← PrevPage 38 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified