GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–125 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Learning From Mistakes Makes LLM Better Reasoner	Oct 31, 2023	GSM8KMath	CodeCode Available	1	5
Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions	May 28, 2022	Arithmetic ReasoningEfficient Exploration	CodeCode Available	1	5
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation	Dec 28, 2023	GSM8KLanguage Model Evaluation	CodeCode Available	1	5
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems	Apr 23, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	1	5
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization	Oct 27, 2024	GSM8KHellaSwag	CodeCode Available	1	5
SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging	Mar 21, 2025	GSM8KSafety Alignment	CodeCode Available	1	5
Self-Training Elicits Concise Reasoning in Large Language Models	Feb 27, 2025	GSM8KIn-Context Learning	CodeCode Available	1	5
AskIt: Unified Programming Interface for Programming with Large Language Models	Aug 29, 2023	Code GenerationFew-Shot Learning	CodeCode Available	1	5
Large Language Models as Optimizers	Sep 7, 2023	GSM8K	CodeCode Available	1	5
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning	Jul 4, 2024	AvgGSM8K	CodeCode Available	1	5
Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization	Mar 26, 2024	Automated Theorem ProvingGSM8K	CodeCode Available	1	5
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning	Jan 27, 2023	Few-Shot LearningGSM8K	CodeCode Available	1	5
Large Language Models are Contrastive Reasoners	Mar 13, 2024	GSM8K	CodeCode Available	1	5
Large (Vision) Language Models are Unsupervised In-Context Learners	Apr 3, 2025	GSM8KIn-Context Learning	CodeCode Available	1	5
Language Models as Science Tutors	Feb 16, 2024	GSM8KMath	CodeCode Available	1	5
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates	Feb 28, 2024	GSM8KSafety Alignment	CodeCode Available	1	5
Matrix Information Theory for Self-Supervised Learning	May 27, 2023	Contrastive LearningGSM8K	CodeCode Available	1	5
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models	Mar 4, 2024	Data AugmentationGSM8K	CodeCode Available	1	5
IRanker: Towards Ranking Foundation Model	Jun 25, 2025	GSM8Kmodel	CodeCode Available	1	5
Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation	Feb 21, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	1	5
GRACE: Discriminator-Guided Chain-of-Thought Reasoning	May 24, 2023	GSM8KMath	CodeCode Available	1	5
Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations	Oct 31, 2023	GSM8KMath	CodeCode Available	1	5
PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models	Mar 4, 2025	GSM8KMath	CodeCode Available	1	5
Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent	Sep 17, 2024	GSM8KQuestion Answering	CodeCode Available	1	5
Boosted Prompt Ensembles for Large Language Models	Apr 12, 2023	GSM8KLanguage Modeling	CodeCode Available	1	5

Show:10 25 50

← PrevPage 5 of 18Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified