| Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad | Mar 27, 2025 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| ProRefine: Inference-time Prompt Refinement with Textual Feedback | Jun 5, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Quantization Meets Reasoning: Exploring LLM Low-Bit Quantization Degradation for Mathematical Reasoning | Jan 6, 2025 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement | Sep 18, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Random Feedback Alignment Algorithms to train Neural Networks: Why do they Align? | Jun 4, 2023 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition | May 16, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| ReasonAgain: Using Extractable Symbolic Programs to Evaluate Mathematical Reasoning | Oct 24, 2024 | GSM8KMath | —Unverified | 0 | 0 |
| Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment | Feb 5, 2025 | GSM8KHumanEval | —Unverified | 0 | 0 |
| Reasoning in Conversation: Solving Subjective Tasks through Dialogue Simulation for Large Language Models | Feb 27, 2024 | Dark Humor DetectionDialogue Generation | —Unverified | 0 | 0 |
| MMLU-SR: A Benchmark for Stress-Testing Reasoning Capability of Large Language Models | Jun 15, 2024 | Mathematical ReasoningMMLU | —Unverified | 0 | 0 |