GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast	May 23, 2024	Computational EfficiencyGSM8K	CodeCode Available	1	5
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization	Oct 27, 2024	GSM8KHellaSwag	CodeCode Available	1	5
Markovian Transformers for Informative Language Modeling	Apr 29, 2024	GSM8KInformativeness	CodeCode Available	1	5
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving	Feb 27, 2025	GSM8KMath	CodeCode Available	1	5
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations	Dec 14, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	1	5
UTMath: Math Evaluation with Unit Test via Reasoning-to-Coding Thoughts	Nov 11, 2024	Code GenerationGSM8K	CodeCode Available	1	5
Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems	Oct 3, 2023	GSM8KMath	CodeCode Available	0	5
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement	Oct 12, 2024	Code GenerationComputational Efficiency	CodeCode Available	0	5
metabench -- A Sparse Benchmark to Measure General Ability in Large Language Models	Jul 4, 2024	ARCGSM8K	CodeCode Available	0	5
The Price of Format: Diversity Collapse in LLMs	May 25, 2025	DiversityGSM8K	CodeCode Available	0	5
Exploring LLM Reasoning Through Controlled Prompt Variations	Apr 2, 2025	GSM8KMathematical Problem-Solving	CodeCode Available	0	5
Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning	Aug 21, 2023	GSM8K	CodeCode Available	0	5
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need	Jun 18, 2025	GSM8KHumanEval	CodeCode Available	0	5
Activation Steering for Chain-of-Thought Compression	Jul 7, 2025	GSM8KMath	CodeCode Available	0	5
AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations	Nov 22, 2023	Common Sense ReasoningGSM8K	CodeCode Available	0	5
Text-to-LoRA: Instant Transformer Adaption	Jun 6, 2025	ARCGSM8K	CodeCode Available	0	5
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning	May 22, 2025	GSM8KMath	CodeCode Available	0	5
SMART: Self-learning Meta-strategy Agent for Reasoning Tasks	Oct 21, 2024	GSM8KSelf-Learning	CodeCode Available	0	5
CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation	Feb 28, 2025	GSM8K	CodeCode Available	0	5
Enhancing Knowledge Distillation for LLMs with Response-Priming Prompting	Dec 18, 2024	GSM8KKnowledge Distillation	CodeCode Available	0	5
Adaptive Rectification Sampling for Test-Time Compute Scaling	Apr 2, 2025	GSM8KLogical Reasoning	CodeCode Available	0	5
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning	Sep 16, 2023	Date UnderstandingGSM8K	CodeCode Available	0	5
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation	Jun 16, 2024	Continual LearningGSM8K	CodeCode Available	0	5
LLM2: Let Large Language Models Harness System 2 Reasoning	Dec 29, 2024	GSM8KMathematical Reasoning	CodeCode Available	0	5
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective	Feb 20, 2025	GSM8KMath	CodeCode Available	0	5
SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving	Oct 19, 2023	GSM8KMath	CodeCode Available	0	5
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation	Oct 17, 2024	GSM8KLanguage Modeling	CodeCode Available	0	5
Re-Initialization Token Learning for Tool-Augmented Large Language Models	Jun 17, 2025	GSM8KQuestion Answering	CodeCode Available	0	5
Learning a Continue-Thinking Token for Enhanced Test-Time Scaling	Jun 12, 2025	GSM8KMath	CodeCode Available	0	5
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls	Feb 16, 2025	Computational EfficiencyGSM8K	CodeCode Available	0	5
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models	Apr 3, 2025	GSM8KReinforcement Learning (RL)	CodeCode Available	0	5
Can LLMs Reason in the Wild with Programs?	Jun 19, 2024	GSM8KMath	CodeCode Available	0	5
Scaling Speculative Decoding with Lookahead Reasoning	Jun 24, 2025	GPUGSM8K	CodeCode Available	0	5
DIVE: Diversified Iterative Self-Improvement	Jan 1, 2025	DiversityGSM8K	CodeCode Available	0	5
ArithmAttack: Evaluating Robustness of LLMs to Noisy Context in Math Problem Solving	Jan 14, 2025	GSM8KMath	CodeCode Available	0	5
Distilling Reasoning Capabilities into Smaller Language Models	Dec 1, 2022	GSM8KKnowledge Distillation	CodeCode Available	0	5
One Language, Many Gaps: Evaluating Dialect Fairness and Robustness of Large Language Models in Reasoning Tasks	Oct 14, 2024	FairnessGSM8K	CodeCode Available	0	5
Discriminative Policy Optimization for Token-Level Reward Models	May 29, 2025	GSM8KLanguage Modeling	CodeCode Available	0	5
DiscQuant: A Quantization Method for Neural Networks Inspired by Discrepancy Theory	Jan 11, 2025	GSM8KQuantization	CodeCode Available	0	5
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning	May 23, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	0	5
Calc-X and Calcformers: Empowering Arithmetical Chain-of-Thought through Interaction with Symbolic Systems	May 24, 2023	Arithmetic ReasoningGSM8K	CodeCode Available	0	5
Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation	Jun 20, 2024	GSM8KLanguage Model Evaluation	CodeCode Available	0	5
DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability	Mar 4, 2025	GSM8KLogical Reasoning	CodeCode Available	0	5
Inference Scaling vs Reasoning: An Empirical Analysis of Compute-Optimal LLM Problem-Solving	Dec 20, 2024	Computational EfficiencyGSM8K	CodeCode Available	0	5
In-Context Principle Learning from Mistakes	Feb 8, 2024	GSM8KIn-Context Learning	CodeCode Available	0	5
A mixed policy to improve performance of language models on math problems	Jul 17, 2023	GSM8KMath	CodeCode Available	0	5
How to Leverage Demonstration Data in Alignment for Large Language Model? A Self-Imitation Learning Perspective	Oct 14, 2024	Density Ratio EstimationGSM8K	CodeCode Available	0	5
DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression	Jul 16, 2025	GSM8K	CodeCode Available	0	5
NLoRA: Nyström-Initiated Low-Rank Adaptation for Large Language Models	Feb 20, 2025	GSM8KNatural Language Understanding	CodeCode Available	0	5
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges	Feb 12, 2025	GSM8KMath	CodeCode Available	0	5

Show:10 25 50

← PrevPage 4 of 9Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified