SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 291–300 of 439 papers

Title	Date	Tasks	Status	Hype
ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning	Oct 24, 2024	GSM8KMath	—Unverified	0
Adaptive Dense Reward: Understanding the Gap Between Action and Reward Space in Alignment	Oct 23, 2024	GSM8KHumanEval	—Unverified	0
Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation	Oct 22, 2024	GSM8KMath	—Unverified	0
SMART: Self-learning Meta-strategy Agent for Reasoning Tasks	Oct 21, 2024	GSM8KSelf-Learning	CodeCode Available	0
On Designing Effective RL Reward at Training Time for LLM Reasoning	Oct 19, 2024	GSM8KMath	—Unverified	0
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling	Oct 18, 2024	Computational EfficiencyGSM8K	—Unverified	0
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation	Oct 17, 2024	GSM8KLanguage Modeling	CodeCode Available	0
Not All Votes Count! Programs as Verifiers Improve Self-Consistency of Language Models for Math Reasoning	Oct 16, 2024	AllGSM8K	CodeCode Available	0
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs	Oct 15, 2024	GSM8KMath	—Unverified	0
How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective	Oct 14, 2024	Density Ratio EstimationGSM8K	CodeCode Available	0

Show:10 25 50

← PrevPage 30 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified