SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 321–330 of 439 papers

Title	Date	Tasks	Status	Hype
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking	Mar 14, 2024	GSM8KLanguage Modelling	CodeCode Available	4
Large Language Models are Contrastive Reasoners	Mar 13, 2024	GSM8K	CodeCode Available	1
Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control	Mar 11, 2024	Code GenerationDiversity	—Unverified	0
Common 7B Language Models Already Possess Strong Math Capabilities	Mar 7, 2024	GSM8KMath	CodeCode Available	5
MathScale: Scaling Instruction Tuning for Mathematical Reasoning	Mar 5, 2024	GSM8KMath	CodeCode Available	0
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models	Mar 4, 2024	Data AugmentationGSM8K	CodeCode Available	1
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning	Mar 4, 2024	GSM8KMath	—Unverified	0
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers	Feb 29, 2024	GSM8KMath	CodeCode Available	2
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates	Feb 28, 2024	GSM8KSafety Alignment	CodeCode Available	1
MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs	Feb 26, 2024	GSM8KMath	—Unverified	0

Show:10 25 50

← PrevPage 33 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified