GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–439 of 439 papers

Title	Date	Tasks	Status
AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations	Nov 22, 2023	Common Sense ReasoningGSM8K	CodeCode Available
First-Step Advantage: Importance of Starting Right in Multi-Step Math Reasoning	Nov 14, 2023	GSM8KMath	—Unverified
SAIE Framework: Support Alone Isn't Enough -- Advancing LLM Training with Adversarial Remarks	Nov 14, 2023	GSM8KMath	—Unverified
The ART of LLM Refinement: Ask, Refine, and Trust	Nov 14, 2023	Arithmetic ReasoningGSM8K	—Unverified
Let's Reinforce Step by Step	Nov 10, 2023	GSM8KLogical Reasoning	—Unverified
Prompt Engineering a Prompt Engineer	Nov 9, 2023	counterfactualCounterfactual Reasoning	—Unverified
The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback	Oct 31, 2023	GSM8KMMLU	—Unverified
SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving	Oct 19, 2023	GSM8KMath	CodeCode Available
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning	Oct 16, 2023	Code GenerationGSM8K	—Unverified
DavIR: Data Selection via Implicit Reward for Large Language Models	Oct 16, 2023	Causal Language ModelingGSM8K	—Unverified
KwaiYiiMath: Technical Report	Oct 11, 2023	Arithmetic ReasoningGSM8K	—Unverified
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference	Oct 4, 2023	BenchmarkingGPU	—Unverified
Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems	Oct 3, 2023	GSM8KMath	CodeCode Available
Think before you speak: Training Language Models With Pause Tokens	Oct 3, 2023	DecoderGSM8K	—Unverified
Large Language Models as Analogical Reasoners	Oct 3, 2023	Code GenerationGSM8K	—Unverified
Adapting LLM Agents with Universal Feedback in Communication	Oct 1, 2023	Decision MakingGSM8K	—Unverified
UPAR: A Kantian-Inspired Prompting Framework for Enhancing Large Language Model Capabilities	Sep 30, 2023	Causal JudgmentGSM8K	—Unverified
Contrastive Decoding Improves Reasoning in Large Language Models	Sep 17, 2023	GSM8KHellaSwag	—Unverified
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning	Sep 16, 2023	Date UnderstandingGSM8K	CodeCode Available
Exploring an LM to generate Prolog Predicates from Mathematics Questions	Sep 7, 2023	GSM8KLanguage Modeling	—Unverified
MathAttack: Attacking Large Language Models Towards Math Solving Ability	Sep 4, 2023	Adversarial AttackGSM8K	—Unverified
No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function	Sep 1, 2023	GSM8KMathematical Reasoning	—Unverified
Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning	Aug 21, 2023	GSM8K	CodeCode Available
A mixed policy to improve performance of language models on math problems	Jul 17, 2023	GSM8KMath	CodeCode Available
DiversiGATE: A Comprehensive Framework for Reliable Large Language Models	Jun 22, 2023	Arithmetic ReasoningGSM8K	—Unverified
Interpretable Math Word Problem Solution Generation Via Step-by-step Planning	Jun 1, 2023	GSM8KLanguage Modeling	—Unverified
Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems	May 24, 2023	Arithmetic ReasoningGSM8K	CodeCode Available
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning	May 23, 2023	Arithmetic ReasoningGSM8K	CodeCode Available
RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought	May 19, 2023	Arithmetic ReasoningGSM8K	—Unverified
Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs	May 19, 2023	Arithmetic ReasoningGSM8K	—Unverified
Self-Evaluation Guided Beam Search for Reasoning	May 1, 2023	Arithmetic ReasoningGSM8K	—Unverified
Teaching Small Language Models to Reason	Dec 16, 2022	GSM8KKnowledge Distillation	—Unverified
Distilling Reasoning Capabilities into Smaller Language Models	Dec 1, 2022	GSM8KKnowledge Distillation	CodeCode Available
Explicit Knowledge Transfer for Weakly-Supervised Code Generation	Nov 30, 2022	Code GenerationFew-Shot Learning	—Unverified
Solving math word problems with process- and outcome-based feedback	Nov 25, 2022	Arithmetic ReasoningGSM8K	—Unverified
Large Language Models Can Self-Improve	Oct 20, 2022	Arithmetic ReasoningCommon Sense Reasoning	—Unverified
Transcending Scaling Laws with 0.1% Extra Compute	Oct 20, 2022	Arithmetic ReasoningCross-Lingual Question Answering	—Unverified
Complexity-Based Prompting for Multi-Step Reasoning	Oct 3, 2022	Date UnderstandingGSM8K	—Unverified
Making Large Language Models Better Reasoners with Step-Aware Verifier	Jun 6, 2022	Arithmetic ReasoningFew-Shot Learning	—Unverified

Show:10 25 50

← PrevPage 9 of 9Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified