| LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization | Oct 27, 2024 | GSM8KHellaSwag | CodeCode Available | 1 |
| Multiple-Choice Questions are Efficient and Robust LLM Evaluators | May 20, 2024 | GSM8KHumanEval | CodeCode Available | 1 |
| Design of Chain-of-Thought in Math Problem Solving | Sep 20, 2023 | DiversityGSM8K | CodeCode Available | 1 |
| FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving | Feb 27, 2025 | GSM8KMath | CodeCode Available | 1 |
| Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs | Nov 16, 2023 | Arithmetic ReasoningGSM8K | CodeCode Available | 1 |
| Self-Consistency Improves Chain of Thought Reasoning in Language Models | Mar 21, 2022 | ARCArithmetic Reasoning | CodeCode Available | 1 |
| Fine-Grained Self-Endorsement Improves Factuality and Reasoning | Feb 23, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning | Oct 8, 2024 | GSM8KHallucination | —Unverified | 0 |
| CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs | Jul 8, 2025 | GSM8KMath | —Unverified | 0 |
| Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers | Jun 5, 2025 | GSM8KMath | —Unverified | 0 |
| Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty | Jun 12, 2025 | GSM8K | —Unverified | 0 |
| A Careful Examination of Large Language Model Performance on Grade School Arithmetic | May 1, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving | May 20, 2024 | GSM8KMath | —Unverified | 0 |
| Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree | Dec 17, 2024 | GSM8KHumanEval | —Unverified | 0 |
| Cool-Fusion: Fuse Large Language Models without Training | Jul 29, 2024 | Combinatorial OptimizationGSM8K | —Unverified | 0 |
| Automatic Prompt Selection for Large Language Models | Apr 3, 2024 | GSM8KQuestion Answering | —Unverified | 0 |
| ControlMath: Controllable Data Generation Promotes Math Generalist Models | Sep 20, 2024 | Data AugmentationDiversity | —Unverified | 0 |
| Maximizing Confidence Alone Improves Reasoning | May 28, 2025 | GSM8KMath | —Unverified | 0 |
| Memory-Efficient LLM Training by Various-Grained Low-Rank Projection of Gradients | May 3, 2025 | GSM8KMMLU | —Unverified | 0 |
| Exploring an LM to generate Prolog Predicates from Mathematics Questions | Sep 7, 2023 | GSM8KLanguage Modeling | —Unverified | 0 |
| Explicit Knowledge Transfer for Weakly-Supervised Code Generation | Nov 30, 2022 | Code GenerationFew-Shot Learning | —Unverified | 0 |
| Contrastive Decoding Improves Reasoning in Large Language Models | Sep 17, 2023 | GSM8KHellaSwag | —Unverified | 0 |
| Excessive Reasoning Attack on Reasoning LLMs | Jun 17, 2025 | GSM8K | —Unverified | 0 |
| Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization | Feb 8, 2025 | GSM8KMath | —Unverified | 0 |
| Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost | Jul 29, 2024 | GSM8KPrompt Engineering | —Unverified | 0 |
| Evolutionary Pre-Prompt Optimization for Mathematical Reasoning | Dec 5, 2024 | Few-Shot LearningGSM8K | —Unverified | 0 |
| Evaluation of LLMs for mathematical problem solving | May 30, 2025 | GSM8KMathematical Problem-Solving | —Unverified | 0 |
| Complexity-Based Prompting for Multi-Step Reasoning | Oct 3, 2022 | Date UnderstandingGSM8K | —Unverified | 0 |
| Advancing Process Verification for Large Language Models via Tree-Based Preference Learning | Jun 29, 2024 | Binary ClassificationGSM8K | —Unverified | 0 |
| Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation | Apr 16, 2025 | GSM8KMath | —Unverified | 0 |
| MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs | Feb 26, 2024 | GSM8KMath | —Unverified | 0 |
| Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search | Jun 10, 2025 | GSM8KMath | —Unverified | 0 |
| AutoJudge: Judge Decoding Without Manual Annotation | Apr 28, 2025 | GSM8KLarge Language Model | —Unverified | 0 |
| CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning | Oct 3, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma2 | May 9, 2025 | ARCBelebele | —Unverified | 0 |
| Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation | Oct 3, 2024 | GSM8KMath | —Unverified | 0 |
| MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task | Feb 17, 2025 | Code CompletionGSM8K | —Unverified | 0 |
| Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth | May 2, 2025 | GSM8KQuantization | —Unverified | 0 |
| Efficient Data Selection at Scale via Influence Distillation | May 25, 2025 | GSM8KMMLU | —Unverified | 0 |
| ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference | Feb 1, 2025 | GPUGSM8K | —Unverified | 0 |
| AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection | May 12, 2025 | GSM8KHumanEval | —Unverified | 0 |
| LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ | Sep 25, 2024 | ChatbotGSM8K | —Unverified | 0 |
| Dynamic Parallel Tree Search for Efficient LLM Reasoning | Feb 22, 2025 | Computational EfficiencyGSM8K | —Unverified | 0 |
| LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models | May 25, 2025 | GSM8KHumanEval | —Unverified | 0 |
| LiteSearch: Efficacious Tree Search for LLM | Jun 29, 2024 | GSM8KMathematical Reasoning | —Unverified | 0 |
| Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities | Dec 22, 2023 | ChatbotGSM8K | —Unverified | 0 |
| Leveraging Uncertainty Estimation for Efficient LLM Routing | Feb 16, 2025 | GSM8KMMLU | —Unverified | 0 |
| Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models | Nov 13, 2024 | GSM8K | —Unverified | 0 |
| Let's reward step by step: Step-Level reward model as the Navigators for Reasoning | Oct 16, 2023 | Code GenerationGSM8K | —Unverified | 0 |
| MathDivide: Improved mathematical reasoning by large language models | May 12, 2024 | GSM8KLogical Reasoning | —Unverified | 0 |