GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–175 of 439 papers

Title	Date	Tasks	Status	Hype	Score
Large Language Models as Optimizers	Sep 7, 2023	GSM8K	CodeCode Available	1	5
Lexico: Extreme KV Cache Compression via Sparse Coding over Universal Dictionaries	Dec 12, 2024	4kGSM8K	CodeCode Available	1	5
Design of Chain-of-Thought in Math Problem Solving	Sep 20, 2023	DiversityGSM8K	CodeCode Available	1	5
SORSA: Singular Values and Orthonormal Regularized Singular Vectors Adaptation of Large Language Models	Aug 21, 2024	8kGSM8K	CodeCode Available	1	5
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties	Jun 6, 2025	GSM8K	CodeCode Available	1	5
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team	Jun 17, 2025	Code GenerationGSM8K	CodeCode Available	1	5
Fill in the Blank: Exploring and Enhancing LLM Capabilities for Backward Reasoning in Math Word Problems	Oct 3, 2023	GSM8KMath	CodeCode Available	0	5
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement	Oct 12, 2024	Code GenerationComputational Efficiency	CodeCode Available	0	5
SEGO: Sequential Subgoal Optimization for Mathematical Problem-Solving	Oct 19, 2023	GSM8KMath	CodeCode Available	0	5
Exploring LLM Reasoning Through Controlled Prompt Variations	Apr 2, 2025	GSM8KMathematical Problem-Solving	CodeCode Available	0	5
Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning	Aug 21, 2023	GSM8K	CodeCode Available	0	5
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need	Jun 18, 2025	GSM8KHumanEval	CodeCode Available	0	5
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation	Jun 16, 2024	Continual LearningGSM8K	CodeCode Available	0	5
Scaling Speculative Decoding with Lookahead Reasoning	Jun 24, 2025	GPUGSM8K	CodeCode Available	0	5
Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems	Sep 30, 2024	GSM8KMath	CodeCode Available	0	5
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation	Oct 17, 2024	GSM8KLanguage Modeling	CodeCode Available	0	5
Activation Steering for Chain-of-Thought Compression	Jul 7, 2025	GSM8KMath	CodeCode Available	0	5
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning	May 22, 2025	GSM8KMath	CodeCode Available	0	5
CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation	Feb 28, 2025	GSM8K	CodeCode Available	0	5
Reasoning Under 1 Billion: Memory-Augmented Reinforcement Learning for Large Language Models	Apr 3, 2025	GSM8KReinforcement Learning (RL)	CodeCode Available	0	5
Enhancing Knowledge Distillation for LLMs with Response-Priming Prompting	Dec 18, 2024	GSM8KKnowledge Distillation	CodeCode Available	0	5
Adaptive Rectification Sampling for Test-Time Compute Scaling	Apr 2, 2025	GSM8KLogical Reasoning	CodeCode Available	0	5
Re-Initialization Token Learning for Tool-Augmented Large Language Models	Jun 17, 2025	GSM8KQuestion Answering	CodeCode Available	0	5
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning	Sep 16, 2023	Date UnderstandingGSM8K	CodeCode Available	0	5
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective	Feb 20, 2025	GSM8KMath	CodeCode Available	0	5

Show:10 25 50

← PrevPage 7 of 18Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified