SOTAVerified|Agents Browse Leaderboard About Blog

GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 426–439 of 439 papers

Title	Date	Tasks	Status	Hype
Interpretable Math Word Problem Solution Generation Via Step-by-step Planning	Jun 1, 2023	GSM8KLanguage Modeling	—Unverified	0
Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems	May 24, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	0
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning	May 23, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	0
RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought	May 19, 2023	Arithmetic ReasoningGSM8K	—Unverified	0
Hint of Thought prompting: an explainable and zero-shot approach to reasoning tasks with LLMs	May 19, 2023	Arithmetic ReasoningGSM8K	—Unverified	0
Self-Evaluation Guided Beam Search for Reasoning	May 1, 2023	Arithmetic ReasoningGSM8K	—Unverified	0
Teaching Small Language Models to Reason	Dec 16, 2022	GSM8KKnowledge Distillation	—Unverified	0
Distilling Reasoning Capabilities into Smaller Language Models	Dec 1, 2022	GSM8KKnowledge Distillation	CodeCode Available	0
Explicit Knowledge Transfer for Weakly-Supervised Code Generation	Nov 30, 2022	Code GenerationFew-Shot Learning	—Unverified	0
Solving math word problems with process- and outcome-based feedback	Nov 25, 2022	Arithmetic ReasoningGSM8K	—Unverified	0
Large Language Models Can Self-Improve	Oct 20, 2022	Arithmetic ReasoningCommon Sense Reasoning	—Unverified	0
Transcending Scaling Laws with 0.1% Extra Compute	Oct 20, 2022	Arithmetic ReasoningCross-Lingual Question Answering	—Unverified	0
Complexity-Based Prompting for Multi-Step Reasoning	Oct 3, 2022	Date UnderstandingGSM8K	—Unverified	0
Making Large Language Models Better Reasoners with Step-Aware Verifier	Jun 6, 2022	Arithmetic ReasoningFew-Shot Learning	—Unverified	0

Show:10 25 50

← PrevPage 18 of 18Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified