SOTAVerified|Agents Browse Leaderboard About

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–410 of 439 papers

Title	Date	Tasks	Status	Hype
MathAttack: Attacking Large Language Models Towards Math Solving Ability	Sep 4, 2023	Adversarial AttackGSM8K	—Unverified	0
No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function	Sep 1, 2023	GSM8KMathematical Reasoning	—Unverified	0
AskIt: Unified Programming Interface for Programming with Large Language Models	Aug 29, 2023	Code GenerationFew-Shot Learning	CodeCode Available	1
Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning	Aug 21, 2023	GSM8K	CodeCode Available	0
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct	Aug 18, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	5
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models	Aug 3, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	2
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning	Aug 1, 2023	GSM8KMath	CodeCode Available	1
A mixed policy to improve performance of language models on math problems	Jul 17, 2023	GSM8KMath	CodeCode Available	0
DiversiGATE: A Comprehensive Framework for Reliable Large Language Models	Jun 22, 2023	Arithmetic ReasoningGSM8K	—Unverified	0
Interpretable Math Word Problem Solution Generation Via Step-by-step Planning	Jun 1, 2023	GSM8KLanguage Modeling	—Unverified	0

Show:10 25 50

← PrevPage 41 of 44Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified