GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 439 papers

Title	Date	Tasks	Status	Hype
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools	Jun 18, 2024	AllGSM8K	CodeCode Available	14
Qwen2 Technical Report	Jul 15, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	13
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning	Mar 26, 2024	GPUGSM8K	CodeCode Available	9
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression	Mar 19, 2024	GSM8KLanguage Modelling	CodeCode Available	9
Qwen2.5-Omni Technical Report	Mar 26, 2025	Automatic Speech Recognition (ASR)GSM8K	CodeCode Available	7
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training	May 23, 2024	GSM8KMixture-of-Experts	CodeCode Available	7
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models	Jan 28, 2022	Common Sense ReasoningGSM8K	CodeCode Available	6
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct	Aug 18, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	5
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models	Oct 9, 2023	GSM8KIn-Context Learning	CodeCode Available	5
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B	Jun 11, 2024	Decision MakingGSM8K	CodeCode Available	5
Common 7B Language Models Already Possess Strong Math Capabilities	Mar 7, 2024	GSM8KMath	CodeCode Available	5
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset	Feb 15, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	4
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator	Dec 16, 2024	GSM8KLanguage Modeling	CodeCode Available	4
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights	Oct 11, 2024	GSM8KMath	CodeCode Available	4
ReFT: Reasoning with Reinforced Fine-Tuning	Jan 17, 2024	GSM8KMath	CodeCode Available	4
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking	Mar 14, 2024	GSM8KLanguage Modelling	CodeCode Available	4
Baichuan 2: Open Large-scale Language Models	Sep 19, 2023	Feature EngineeringGSM8K	CodeCode Available	4
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning	Feb 9, 2024	Data AugmentationGSM8K	CodeCode Available	4
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers	Aug 12, 2024	GSM8KMath	CodeCode Available	4
PAL: Program-aided Language Models	Nov 18, 2022	Arithmetic ReasoningGSM8K	CodeCode Available	3
Automatic Instruction Evolving for Large Language Models	Jun 2, 2024	GSM8KHumanEval	CodeCode Available	3
TokenSkip: Controllable Chain-of-Thought Compression in LLMs	Feb 17, 2025	GSM8K	CodeCode Available	3
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning	May 13, 2024	Data AugmentationGSM8K	CodeCode Available	3
Thinkless: LLM Learns When to Think	May 19, 2025	GSM8KMath	CodeCode Available	3
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step	May 23, 2024	GSM8K	CodeCode Available	3
MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline	Jan 16, 2024	GSM8KMath	CodeCode Available	3
Training Verifiers to Solve Math Word Problems	Oct 27, 2021	GSM8KMath	CodeCode Available	3
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning	May 1, 2024	ARCGSM8K	CodeCode Available	3
LoRA-GA: Low-Rank Adaptation with Gradient Approximation	Jul 6, 2024	GSM8Kparameter-efficient fine-tuning	CodeCode Available	3
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution	Apr 13, 2025	GSM8KMath	CodeCode Available	3
Scaling up Masked Diffusion Models on Text	Oct 24, 2024	GSM8KLanguage Modeling	CodeCode Available	3
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling	Jul 31, 2024	GSM8KMath	CodeCode Available	3
SkyMath: Technical Report	Oct 25, 2023	GSM8KLanguage Modeling	CodeCode Available	3
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models	May 26, 2023	GSM8KMultimodal Reasoning	CodeCode Available	3
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models	Apr 3, 2024	GSM8KQuantization	CodeCode Available	3
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding	Apr 25, 2024	GSM8KHellaSwag	CodeCode Available	3
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs	Jun 26, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	3
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models	Oct 10, 2024	GSM8KMath	CodeCode Available	2
How to Correctly do Semantic Backpropagation on Language-based Agentic Systems	Dec 4, 2024	GSM8K	CodeCode Available	2
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers	Feb 29, 2024	GSM8KMath	CodeCode Available	2
Offline Reinforcement Learning for LLM Multi-Step Reasoning	Dec 20, 2024	GSM8KMath	CodeCode Available	2
Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts	Feb 12, 2024	Continual PretrainingGSM8K	CodeCode Available	2
Meta Prompting for AI Systems	Nov 20, 2023	Data InteractionGSM8K	CodeCode Available	2
Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning	May 5, 2024	GSM8KMath	CodeCode Available	2
Natural Language Fine-Tuning	Dec 29, 2024	GSM8KLarge Language Model	CodeCode Available	2
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process	Jul 29, 2024	GSM8KMath	CodeCode Available	2
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models	Mar 21, 2025	GSM8KQuestion Answering	CodeCode Available	2
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization	Oct 11, 2024	GSM8KLanguage Modeling	CodeCode Available	2
Dynamic Early Exit in Reasoning Models	Apr 22, 2025	GSM8KMath	CodeCode Available	2
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters	May 27, 2024	BenchmarkingGSM8K	CodeCode Available	2

Show:10 25 50

← PrevPage 1 of 9Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified