| Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad | Mar 27, 2025 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| ProRefine: Inference-time Prompt Refinement with Textual Feedback | Jun 5, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning | Jan 6, 2025 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement | Sep 18, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Random Feedback Alignment Algorithms to train Neural Networks: Why do they Align? | Jun 4, 2023 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition | May 16, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning | Oct 24, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment | Feb 5, 2025 | GSM8KHumanEval | —Unverified | 0 | 0 |
| Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models | Feb 27, 2024 | Dark Humor DetectionDialogue Generation | —Unverified | 0 | 0 |
| MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models | Jun 15, 2024 | Mathematical ReasoningMMLU | —Unverified | 0 | 0 |
| Recognizing and Verifying Mathematical Equations using Multiplicative Differential Neural Units | Apr 7, 2021 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection | Mar 21, 2024 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Reliable and Efficient Amortized Model-based Evaluation | Mar 17, 2025 | DiagnosticMathematical Reasoning | —Unverified | 0 | 0 |
| Reliable Natural Language Understanding with Large Language Models and Answer Set Programming | Feb 7, 2023 | Mathematical ReasoningNatural Language Understanding | —Unverified | 0 | 0 |
| Reliable Reasoning Beyond Natural Language | Jul 16, 2024 | GSM8KMathematical Reasoning | —Unverified | 0 | 0 |
| ReTool: Reinforcement Learning for Strategic Tool Use in LLMs | Apr 15, 2025 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning | Feb 20, 2025 | Mathematical ReasoningRetrieval | —Unverified | 0 | 0 |
| Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot | Jun 17, 2025 | In-Context LearningMathematical Reasoning | —Unverified | 0 | 0 |
| Revisiting Multi-Agent Debate as Test-Time Scaling: A Systematic Study of Conditional Effectiveness | May 29, 2025 | DiversityLarge Language Model | —Unverified | 0 | 0 |
| Revisiting Overthinking in Long Chain-of-Thought from the Perspective of Self-Doubt | May 29, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Revisiting Self-Consistency from Dynamic Distributional Alignment Perspective on Answer Aggregation | Feb 27, 2025 | DiversityMathematical Reasoning | —Unverified | 0 | 0 |
| Revisiting Test-Time Scaling: A Survey and a Diversity-Aware Method for Efficient Reasoning | Jun 5, 2025 | DiversityMathematical Reasoning | —Unverified | 0 | 0 |
| Revisiting the Superficial Alignment Hypothesis | Sep 27, 2024 | Instruction FollowingMath | —Unverified | 0 | 0 |
| RL-finetuning LLMs from on- and off-policy data with a single algorithm | Mar 25, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions | Jun 7, 2024 | HallucinationMathematical Reasoning | —Unverified | 0 | 0 |
| RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library | Apr 29, 2025 | Data AugmentationMathematical Reasoning | —Unverified | 0 | 0 |
| S^3c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners | Sep 3, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models | Apr 5, 2024 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking | Dec 12, 2024 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Sample, Don't Search: Rethinking Test-Time Alignment for Language Models | Apr 4, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 | 0 |
| Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search | Feb 4, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| SAT Solvers and Computer Algebra Systems: A Powerful Combination for Mathematics | Jul 9, 2019 | Mathematical ProofsMathematical Reasoning | —Unverified | 0 | 0 |
| SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization | May 18, 2025 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Selective Code Generation for Functional Guarantees | May 19, 2025 | Code GenerationHallucination | —Unverified | 0 | 0 |
| Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models | Feb 12, 2025 | Mathematical ReasoningMMLU | —Unverified | 0 | 0 |
| Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models | Mar 4, 2025 | GSM8KMath | —Unverified | 0 | 0 |
| Self-Training Large Language Models for Tool-Use Without Demonstrations | Feb 9, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 | 0 |
| Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models | Feb 18, 2025 | Code GenerationGeneral Knowledge | —Unverified | 0 | 0 |
| Can Graph Descriptive Order Affect Solving Graph Problems with LLMs? | Feb 11, 2024 | DescriptiveLanguage Modelling | —Unverified | 0 | 0 |
| SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models | Aug 28, 2024 | Data AugmentationGSM8K | —Unverified | 0 | 0 |
| Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models | Aug 1, 2023 | In-Context LearningMath | —Unverified | 0 | 0 |
| Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On | Jul 11, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Slimming Down LLMs Without Losing Their Minds | Jun 12, 2025 | Computational EfficiencyGSM8K | —Unverified | 0 | 0 |
| SMART: A Situation Model for Algebra Story Problems via Attributed Grammar | Dec 27, 2020 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving | May 22, 2025 | DiagnosticMathematical Problem-Solving | —Unverified | 0 | 0 |
| SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs | Dec 11, 2024 | ARCGSM8K | —Unverified | 0 | 0 |
| SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning | Apr 27, 2025 | Large Language ModelMathematical Reasoning | —Unverified | 0 | 0 |
| Speculative Decoding for Multi-Sample Inference | Mar 7, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| SSR: Speculative Parallel Scaling Reasoning in Test-time | May 21, 2025 | DiversityMath | —Unverified | 0 | 0 |
| STEM-POM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing | Nov 1, 2024 | 2kIn-Context Learning | —Unverified | 0 | 0 |