SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 231–240 of 439 papers

Title	Date	Tasks	Status	Hype
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning	Sep 10, 2024	GSM8KMixture-of-Experts	—Unverified	0
Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation	Sep 5, 2024	GSM8K	—Unverified	0
Prompt Baking	Sep 4, 2024	ARCGSM8K	—Unverified	0
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models	Sep 4, 2024	GSM8KMath	CodeCode Available	2
Building Math Agents with Multi-Turn Iterative Preference Learning	Sep 4, 2024	GSM8KMath	—Unverified	0
S^3c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners	Sep 3, 2024	GSM8KMath	—Unverified	0
Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems	Aug 29, 2024	GSM8KLanguage Modeling	—Unverified	0
Critic-CoT: Boosting the reasoning abilities of large language model via Chain-of-thoughts Critic	Aug 29, 2024	GSM8KLanguage Modeling	—Unverified	0
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models	Aug 28, 2024	Data AugmentationGSM8K	—Unverified	0
SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models	Aug 21, 2024	8kGSM8K	CodeCode Available	1

Show:10 25 50

← PrevPage 24 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified