GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–75 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning	May 5, 2024	GSM8KMath	CodeCode Available	2	5
Let LLMs Break Free from Overthinking via Self-Braking Tuning	May 20, 2025	GSM8K	CodeCode Available	2	5
Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts	Feb 12, 2024	Continual PretrainingGSM8K	CodeCode Available	2	5
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization	Oct 11, 2024	GSM8KLanguage Modeling	CodeCode Available	2	5
MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning	Oct 9, 2023	Arithmetic ReasoningData Augmentation	CodeCode Available	2	5
Preference Optimization for Reasoning with Pseudo Feedback	Nov 25, 2024	GSM8KMath	CodeCode Available	2	5
Large Language Models are Zero-Shot Reasoners	May 24, 2022	Arithmetic ReasoningCommon Sense Reasoning	CodeCode Available	2	5
ProcessBench: Identifying Process Errors in Mathematical Reasoning	Dec 9, 2024	GSM8KMath	CodeCode Available	2	5
Dynamic Early Exit in Reasoning Models	Apr 22, 2025	GSM8KMath	CodeCode Available	2	5
Progressive-Hint Prompting Improves Reasoning in Large Language Models	Apr 19, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	2	5
any4: Learned 4-bit Numeric Representation for LLMs	Jul 7, 2025	GPUGSM8K	CodeCode Available	2	5
Offline Reinforcement Learning for LLM Multi-Step Reasoning	Dec 20, 2024	GSM8KMath	CodeCode Available	2	5
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models	Feb 24, 2025	GSM8KMath	CodeCode Available	2	5
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models	Oct 10, 2024	GSM8KMath	CodeCode Available	2	5
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process	Jul 29, 2024	GSM8KMath	CodeCode Available	2	5
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models	Sep 21, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	2	5
Meta Prompting for AI Systems	Nov 20, 2023	Data InteractionGSM8K	CodeCode Available	2	5
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding	Nov 6, 2024	ARCGSM8K	CodeCode Available	2	5
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models	Mar 28, 2025	GPUGSM8K	CodeCode Available	2	5
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning	Oct 5, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	2	5
CoT-Valve: Length-Compressible Chain-of-Thought Tuning	Feb 13, 2025	GSM8K	CodeCode Available	2	5
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch	Nov 6, 2023	DecoderGSM8K	CodeCode Available	2	5
Balancing LoRA Performance and Efficiency with Simple Shard Sharing	Sep 19, 2024	Computational EfficiencyGSM8K	CodeCode Available	2	5
Natural Language Fine-Tuning	Dec 29, 2024	GSM8KLarge Language Model	CodeCode Available	2	5
Language Models are Multilingual Chain-of-Thought Reasoners	Oct 6, 2022	GSM8KMath	CodeCode Available	2	5

Show:10 25 50

← PrevPage 3 of 18Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified