| When is the consistent prediction likely to be a correct prediction? | Jul 8, 2024 | GSM8KPrediction | —Unverified | 0 | 0 |
| Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning | Apr 18, 2025 | AllGSM8K | —Unverified | 0 | 0 |
| Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping | May 13, 2025 | Domain GeneralizationGSM8K | —Unverified | 0 | 0 |
| No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function | Sep 1, 2023 | GSM8KMathematical Reasoning | —Unverified | 0 | 0 |
| Nudging: Inference-time Alignment of LLMs via Guided Decoding | Oct 11, 2024 | General KnowledgeGSM8K | —Unverified | 0 | 0 |
| Fine-Grained Self-Endorsement Improves Factuality and Reasoning | Feb 23, 2024 | GSM8KLanguage Modeling | —Unverified | 0 | 0 |
| FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning | Oct 8, 2024 | GSM8KHallucination | —Unverified | 0 | 0 |
| On Designing Effective RL Reward at Training Time for LLM Reasoning | Oct 19, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Uncertainty Aware Learning for Language Model Alignment | Jun 7, 2024 | GSM8KLanguage Modeling | —Unverified | 0 | 0 |
| Making Large Language Models Better Reasoners with Step-Aware Verifier | Jun 6, 2022 | Arithmetic ReasoningFew-Shot Learning | —Unverified | 0 | 0 |
| Fast on the Easy, Deep on the Hard: Efficient Reasoning via Powered Length Penalty | Jun 12, 2025 | GSM8K | —Unverified | 0 | 0 |
| Optimizing Chain-of-Thought Reasoning: Tackling Arranging Bottleneck via Plan Augmentation | Oct 22, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Orca-Math: Unlocking the potential of SLMs in Grade School Math | Feb 16, 2024 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |
| Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree | Dec 17, 2024 | GSM8KHumanEval | —Unverified | 0 | 0 |
| Exploring an LM to generate Prolog Predicates from Mathematics Questions | Sep 7, 2023 | GSM8KLanguage Modeling | —Unverified | 0 | 0 |
| Uncertainty-Aware Search and Value Models: Mitigating Search Scaling Flaws in LLMs | Feb 16, 2025 | GSM8KThompson Sampling | —Unverified | 0 | 0 |
| Explicit Knowledge Transfer for Weakly-Supervised Code Generation | Nov 30, 2022 | Code GenerationFew-Shot Learning | —Unverified | 0 | 0 |
| PARAMANU-GANITA: Language Model with Mathematical Capabilities | Apr 22, 2024 | Domain AdaptationGSM8K | —Unverified | 0 | 0 |
| Patience Is The Key to Large Language Model Reasoning | Nov 20, 2024 | GSM8KLanguage Modeling | —Unverified | 0 | 0 |
| PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation | Oct 2, 2024 | Data AugmentationDiversity | —Unverified | 0 | 0 |
| Pheromone-based Learning of Optimal Reasoning Paths | Jan 31, 2025 | ARCGSM8K | —Unverified | 0 | 0 |
| Excessive Reasoning Attack on Reasoning LLMs | Jun 17, 2025 | GSM8K | —Unverified | 0 | 0 |
| Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization | Feb 8, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models | Jun 23, 2025 | Code CompletionGSM8K | —Unverified | 0 | 0 |
| PMPO: Probabilistic Metric Prompt Optimization for Small and Large Language Models | May 22, 2025 | GSM8KLarge Language Model | —Unverified | 0 | 0 |
| PMSS: Pretrained Matrices Skeleton Selection for LLM Fine-tuning | Sep 25, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches | Oct 8, 2024 | GPUGSM8K | —Unverified | 0 | 0 |
| PORT: Preference Optimization on Reasoning Traces | Jun 23, 2024 | ARCGSM8K | —Unverified | 0 | 0 |
| Position-Aware Depth Decay Decoding (D^3): Boosting Large Language Model Inference Efficiency | Mar 11, 2025 | GSM8KLanguage Modeling | —Unverified | 0 | 0 |
| Predicting Emergent Capabilities by Finetuning | Nov 25, 2024 | CoLAGSM8K | —Unverified | 0 | 0 |
| Evolutionary Pre-Prompt Optimization for Mathematical Reasoning | Dec 5, 2024 | Few-Shot LearningGSM8K | —Unverified | 0 | 0 |
| Premise Order Matters in Reasoning with Large Language Models | Feb 14, 2024 | GSM8KMathematical Problem-Solving | —Unverified | 0 | 0 |
| PREMISE: Scalable and Strategic Prompt Optimization for Efficient Mathematical Reasoning in Large Models | Jun 12, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 | 0 |
| Evaluation of LLMs for mathematical problem solving | May 30, 2025 | GSM8KMathematical Problem-Solving | —Unverified | 0 | 0 |
| Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation | Apr 16, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| Prompt Baking | Sep 4, 2024 | ARCGSM8K | —Unverified | 0 | 0 |
| Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search | Jun 10, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| Prompt Engineering a Prompt Engineer | Nov 9, 2023 | counterfactualCounterfactual Reasoning | —Unverified | 0 | 0 |
| Prompt-SAW: Leveraging Relation-Aware Graphs for Textual Prompt Compression | Mar 30, 2024 | GSM8KRelation | —Unverified | 0 | 0 |
| Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control | Mar 11, 2024 | Code GenerationDiversity | —Unverified | 0 | 0 |
| Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning | Jun 20, 2024 | GSM8KHeuristic Search | —Unverified | 0 | 0 |
| Quasi-random Multi-Sample Inference for Large Language Models | Nov 9, 2024 | DiversityGSM8K | —Unverified | 0 | 0 |
| Elastic Weight Consolidation for Full-Parameter Continual Pre-Training of Gemma2 | May 9, 2025 | ARCBelebele | —Unverified | 0 | 0 |
| Question-Analysis Prompting Improves LLM Performance in Reasoning Tasks | Jul 4, 2024 | GSM8KStrategyQA | —Unverified | 0 | 0 |
| Question Tokens Deserve More Attention: Enhancing Large Language Models without Training through Step-by-Step Reading and Question Attention Recalibration | Apr 13, 2025 | GSM8K | —Unverified | 0 | 0 |
| Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth | May 2, 2025 | GSM8KQuantization | —Unverified | 0 | 0 |
| Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement | Sep 18, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Efficient Data Selection at Scale via Influence Distillation | May 25, 2025 | GSM8KMMLU | —Unverified | 0 | 0 |
| Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models | Nov 13, 2024 | GSM8K | —Unverified | 0 | 0 |
| RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought | May 19, 2023 | Arithmetic ReasoningGSM8K | —Unverified | 0 | 0 |