GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–425 of 439 papers

Title	Date	Tasks	Status	Hype
MathAttack: Attacking Large Language Models Towards Math Solving Ability	Sep 4, 2023	Adversarial AttackGSM8K	—Unverified	0
No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function	Sep 1, 2023	GSM8KMathematical Reasoning	—Unverified	0
AskIt: Unified Programming Interface for Programming with Large Language Models	Aug 29, 2023	Code GenerationFew-Shot Learning	CodeCode Available	1
Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning	Aug 21, 2023	GSM8K	CodeCode Available	0
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct	Aug 18, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	5
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models	Aug 3, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	2
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning	Aug 1, 2023	GSM8KMath	CodeCode Available	1
A mixed policy to improve performance of language models on math problems	Jul 17, 2023	GSM8KMath	CodeCode Available	0
DiversiGATE: A Comprehensive Framework for Reliable Large Language Models	Jun 22, 2023	Arithmetic ReasoningGSM8K	—Unverified	0
Interpretable Math Word Problem Solution Generation Via Step-by-step Planning	Jun 1, 2023	GSM8KLanguage Modeling	—Unverified	0
Matrix Information Theory for Self-Supervised Learning	May 27, 2023	Contrastive LearningGSM8K	CodeCode Available	1
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models	May 26, 2023	GSM8KMultimodal Reasoning	CodeCode Available	3
GRACE: Discriminator-Guided Chain-of-Thought Reasoning	May 24, 2023	GSM8KMath	CodeCode Available	1
Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems	May 24, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	0
Self-Polish: Enhance Reasoning in Large Language Models via Problem Refinement	May 23, 2023	GSM8K	CodeCode Available	1
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning	May 23, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	0
Automatic Model Selection with Large Language Models for Reasoning	May 23, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	1
RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought	May 19, 2023	Arithmetic ReasoningGSM8K	—Unverified	0
Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs	May 19, 2023	Arithmetic ReasoningGSM8K	—Unverified	0
Self-Evaluation Guided Beam Search for Reasoning	May 1, 2023	Arithmetic ReasoningGSM8K	—Unverified	0
Progressive-Hint Prompting Improves Reasoning in Large Language Models	Apr 19, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	2
Solving Math Word Problems by Combining Language Models With Symbolic Solvers	Apr 16, 2023	GSM8KLanguage Modeling	CodeCode Available	1
Boosted Prompt Ensembles for Large Language Models	Apr 12, 2023	GSM8KLanguage Modeling	CodeCode Available	1
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning	Jan 27, 2023	Few-Shot LearningGSM8K	CodeCode Available	1
Teaching Small Language Models to Reason	Dec 16, 2022	GSM8KKnowledge Distillation	—Unverified	0

Show:10 25 50

← PrevPage 17 of 18Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified