SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 211–220 of 439 papers

Title	Date	Tasks	Status	Hype
LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity	Oct 4, 2024	DiversityEnsemble Pruning	CodeCode Available	0
BrainTransformers: SNN-LLM	Oct 3, 2024	ARCGSM8K	—Unverified	0
Unlocking Structured Thinking in Language Models with Cognitive Prompting	Oct 3, 2024	Arithmetic ReasoningGSM8K	—Unverified	0
CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning	Oct 3, 2024	GSM8KLanguage Modeling	—Unverified	0
The Role of Deductive and Inductive Reasoning in Large Language Models	Oct 3, 2024	GSM8K	—Unverified	0
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation	Oct 3, 2024	GSM8KMath	—Unverified	0
PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation	Oct 2, 2024	Data AugmentationDiversity	—Unverified	0
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment	Oct 2, 2024	GSM8KMath	CodeCode Available	2
Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems	Sep 30, 2024	GSM8KMath	CodeCode Available	0
Instance-adaptive Zero-shot Chain-of-Thought Prompting	Sep 30, 2024	GSM8KMath	—Unverified	0

Show:10 25 50

← PrevPage 22 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified