| oneDAL Optimization for ARM Scalable Vector Extension: Maximizing Efficiency for High-Performance Data Science | Apr 5, 2025 | Math | —Unverified | 0 |
| Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation | Apr 4, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning | Apr 4, 2025 | Mathreinforcement-learning | —Unverified | 0 |
| Large (Vision) Language Models are Unsupervised In-Context Learners | Apr 3, 2025 | GSM8KIn-Context Learning | CodeCode Available | 1 |
| MegaMath: Pushing the Limits of Open Math Corpora | Apr 3, 2025 | DiversityMath | CodeCode Available | 2 |
| BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing | Apr 2, 2025 | 3D ReconstructionBenchmarking | CodeCode Available | 1 |
| Cross-Lingual Consistency: A Novel Inference Framework for Advancing Reasoning in Large Language Models | Apr 2, 2025 | Math | —Unverified | 0 |
| How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study | Apr 1, 2025 | Code GenerationMath | —Unverified | 0 |
| GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning | Apr 1, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Hawkeye:Efficient Reasoning with Model Collaboration | Apr 1, 2025 | Mathmodel | —Unverified | 0 |
| Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics | Apr 1, 2025 | MathMathematical Problem-Solving | —Unverified | 0 |
| Investigating Large Language Models in Diagnosing Students' Cognitive Skills in Math Problem-solving | Apr 1, 2025 | Math | —Unverified | 0 |
| Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead | Mar 31, 2025 | MathSpatial Reasoning | CodeCode Available | 2 |
| Entropy-Based Adaptive Weighting for Self-Training | Mar 31, 2025 | GSM8KMath | CodeCode Available | 1 |
| An extrapolated and provably convergent algorithm for nonlinear matrix decomposition with the ReLU function | Mar 31, 2025 | Data CompressionMath | CodeCode Available | 0 |
| DebFlow: Automating Agent Creation via Agent Debate | Mar 31, 2025 | Math | —Unverified | 0 |
| ToRL: Scaling Tool-Integrated RL | Mar 30, 2025 | Mathreinforcement-learning | CodeCode Available | 3 |
| Learning to Reason for Long-Form Story Generation | Mar 28, 2025 | FormMath | CodeCode Available | 2 |
| QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks? | Mar 28, 2025 | Logical ReasoningMath | CodeCode Available | 1 |
| CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models | Mar 28, 2025 | GPUGSM8K | CodeCode Available | 2 |
| ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models | Mar 27, 2025 | Math | CodeCode Available | 1 |
| Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad | Mar 27, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models | Mar 27, 2025 | Data VisualizationMath | CodeCode Available | 0 |
| Effective Skill Unlearning through Intervention and Abstention | Mar 27, 2025 | General KnowledgeMath | CodeCode Available | 0 |
| Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators | Mar 25, 2025 | Math | —Unverified | 0 |