GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 126–150 of 439 papers

Title	Date	Tasks	Status	Hype
Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization	Mar 26, 2024	Automated Theorem ProvingGSM8K	CodeCode Available	1
Large Language Models are Contrastive Reasoners	Mar 13, 2024	GSM8K	CodeCode Available	1
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models	Mar 4, 2024	Data AugmentationGSM8K	CodeCode Available	1
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates	Feb 28, 2024	GSM8KSafety Alignment	CodeCode Available	1
Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation	Feb 21, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	1
Language Models as Science Tutors	Feb 16, 2024	GSM8KMath	CodeCode Available	1
Over-Reasoning and Redundant Calculation of Large Language Models	Jan 21, 2024	GSM8KMath	CodeCode Available	1
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning	Jan 19, 2024	GSM8KMath	CodeCode Available	1
Evaluating LLMs' Mathematical and Coding Competency through Ontology-guided Interventions	Jan 17, 2024	Arithmetic ReasoningCode Generation	CodeCode Available	1
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation	Dec 28, 2023	GSM8KLanguage Model Evaluation	CodeCode Available	1
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations	Dec 14, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	1
Token-Level Adaptation of LoRA Adapters for Downstream Task Generalization	Nov 17, 2023	ARCGSM8K	CodeCode Available	1
Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs	Nov 16, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	1
OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning	Nov 16, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	1
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models	Nov 10, 2023	GSM8KMemorization	CodeCode Available	1
Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations	Oct 31, 2023	GSM8KMath	CodeCode Available	1
Learning From Mistakes Makes LLM Better Reasoner	Oct 31, 2023	GSM8KMath	CodeCode Available	1
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models	Oct 10, 2023	Code GenerationContinual Learning	CodeCode Available	1
Design of Chain-of-Thought in Math Problem Solving	Sep 20, 2023	DiversityGSM8K	CodeCode Available	1
Large Language Models as Optimizers	Sep 7, 2023	GSM8K	CodeCode Available	1
AskIt: Unified Programming Interface for Programming with Large Language Models	Aug 29, 2023	Code GenerationFew-Shot Learning	CodeCode Available	1
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning	Aug 1, 2023	GSM8KMath	CodeCode Available	1
Matrix Information Theory for Self-Supervised Learning	May 27, 2023	Contrastive LearningGSM8K	CodeCode Available	1
GRACE: Discriminator-Guided Chain-of-Thought Reasoning	May 24, 2023	GSM8KMath	CodeCode Available	1
Automatic Model Selection with Large Language Models for Reasoning	May 23, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	1

Show:10 25 50

← PrevPage 6 of 18Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified