GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 439 papers

Title	Date	Tasks	Status	Hype	Score
LoRA-GA: Low-Rank Adaptation with Gradient Approximation	Jul 6, 2024	GSM8Kparameter-efficient fine-tuning	CodeCode Available	3	5
TokenSkip: Controllable Chain-of-Thought Compression in LLMs	Feb 17, 2025	GSM8K	CodeCode Available	3	5
PAL: Program-aided Language Models	Nov 18, 2022	Arithmetic ReasoningGSM8K	CodeCode Available	3	5
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution	Apr 13, 2025	GSM8KMath	CodeCode Available	3	5
Training Verifiers to Solve Math Word Problems	Oct 27, 2021	GSM8KMath	CodeCode Available	3	5
Scaling up Masked Diffusion Models on Text	Oct 24, 2024	GSM8KLanguage Modeling	CodeCode Available	3	5
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling	Jul 31, 2024	GSM8KMath	CodeCode Available	3	5
SkyMath: Technical Report	Oct 25, 2023	GSM8KLanguage Modeling	CodeCode Available	3	5
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Language Models	May 26, 2023	GSM8KMultimodal Reasoning	CodeCode Available	3	5
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models	Apr 3, 2024	GSM8KQuantization	CodeCode Available	3	5
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding	Apr 25, 2024	GSM8KHellaSwag	CodeCode Available	3	5
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs	Jun 26, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	3	5
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models	Oct 10, 2024	GSM8KMath	CodeCode Available	2	5
How to Correctly do Semantic Backpropagation on Language-based Agentic Systems	Dec 4, 2024	GSM8K	CodeCode Available	2	5
Natural Language Fine-Tuning	Dec 29, 2024	GSM8KLarge Language Model	CodeCode Available	2	5
Offline Reinforcement Learning for LLM Multi-Step Reasoning	Dec 20, 2024	GSM8KMath	CodeCode Available	2	5
Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts	Feb 12, 2024	Continual PretrainingGSM8K	CodeCode Available	2	5
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models	Sep 21, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	2	5
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers	Feb 29, 2024	GSM8KMath	CodeCode Available	2	5
Meta Prompting for AI Systems	Nov 20, 2023	Data InteractionGSM8K	CodeCode Available	2	5
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process	Jul 29, 2024	GSM8KMath	CodeCode Available	2	5
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models	Mar 21, 2025	GSM8KQuestion Answering	CodeCode Available	2	5
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters	May 27, 2024	BenchmarkingGSM8K	CodeCode Available	2	5
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models	Mar 28, 2025	GPUGSM8K	CodeCode Available	2	5
Let LLMs Break Free from Overthinking via Self-Braking Tuning	May 20, 2025	GSM8K	CodeCode Available	2	5

Show:10 25 50

← PrevPage 2 of 18Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified