GSM8K

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–425 of 439 papers

Title	Date	Tasks	Status
MathScale: Scaling Instruction Tuning for Mathematical Reasoning	Mar 5, 2024	GSM8KMath	CodeCode Available
Activation Steering for Chain-of-Thought Compression	Jul 7, 2025	GSM8KMath	CodeCode Available
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges	Feb 12, 2025	GSM8KMath	CodeCode Available
Text-to-LoRA: Instant Transformer Adaption	Jun 6, 2025	ARCGSM8K	CodeCode Available
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?	Mar 23, 2025	GSM8KMath	CodeCode Available
metabench -- A Sparse Benchmark to Measure General Ability in Large Language Models	Jul 4, 2024	ARCGSM8K	CodeCode Available
DAC: A Dynamic Attention-aware Approach for Task-Agnostic Prompt Compression	Jul 16, 2025	GSM8K	CodeCode Available
Adaptive Rectification Sampling for Test-Time Compute Scaling	Apr 2, 2025	GSM8KLogical Reasoning	CodeCode Available
LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning	Sep 19, 2024	GSM8KLogical Reasoning	CodeCode Available
The Price of Format: Diversity Collapse in LLMs	May 25, 2025	DiversityGSM8K	CodeCode Available
ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation	Jun 16, 2024	Continual LearningGSM8K	CodeCode Available
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context Learning	Sep 16, 2023	Date UnderstandingGSM8K	CodeCode Available
LLM-TOPLA: Efficient LLM Ensemble by Maximising Diversity	Oct 4, 2024	DiversityEnsemble Pruning	CodeCode Available
LLM2: Let Large Language Models Harness System 2 Reasoning	Dec 29, 2024	GSM8KMathematical Reasoning	CodeCode Available
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement	Oct 12, 2024	Code GenerationComputational Efficiency	CodeCode Available
Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting	Feb 5, 2025	GSM8KMath	CodeCode Available
Learning a Continue-Thinking Token for Enhanced Test-Time Scaling	Jun 12, 2025	GSM8KMath	CodeCode Available
Inference-Time Decontamination: Reusing Leaked Benchmarks for Large Language Model Evaluation	Jun 20, 2024	GSM8KLanguage Model Evaluation	CodeCode Available
SMART: Self-learning Meta-strategy Agent for Reasoning Tasks	Oct 21, 2024	GSM8KSelf-Learning	CodeCode Available
Can LLMs Reason in the Wild with Programs?	Jun 19, 2024	GSM8KMath	CodeCode Available
VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation	Jun 25, 2024	ARCBenchmarking	CodeCode Available
Inference Scaling vs Reasoning: An Empirical Analysis of Compute-Optimal LLM Problem-Solving	Dec 20, 2024	Computational EfficiencyGSM8K	CodeCode Available
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective	Feb 20, 2025	GSM8KMath	CodeCode Available
In-Context Principle Learning from Mistakes	Feb 8, 2024	GSM8KIn-Context Learning	CodeCode Available
AlignedCoT: Prompting Large Language Models via Native-Speaking Demonstrations	Nov 22, 2023	Common Sense ReasoningGSM8K	CodeCode Available

Show:10 25 50

← PrevPage 17 of 18Next →

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Xolver	Accuracy	98.1	—	Unverified
2	Orange-mini	0-shot MRR	98	—	Unverified