| BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing | Apr 2, 2025 | 3D ReconstructionBenchmarking | CodeCode Available | 1 |
| Cross-Lingual Consistency: A Novel Inference Framework for Advancing Reasoning in Large Language Models | Apr 2, 2025 | Math | —Unverified | 0 |
| How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study | Apr 1, 2025 | Code GenerationMath | —Unverified | 0 |
| GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning | Apr 1, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Hawkeye:Efficient Reasoning with Model Collaboration | Apr 1, 2025 | Mathmodel | —Unverified | 0 |
| Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics | Apr 1, 2025 | MathMathematical Problem-Solving | —Unverified | 0 |
| Investigating Large Language Models in Diagnosing Students' Cognitive Skills in Math Problem-solving | Apr 1, 2025 | Math | —Unverified | 0 |
| Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead | Mar 31, 2025 | MathSpatial Reasoning | CodeCode Available | 2 |
| Entropy-Based Adaptive Weighting for Self-Training | Mar 31, 2025 | GSM8KMath | CodeCode Available | 1 |
| An extrapolated and provably convergent algorithm for nonlinear matrix decomposition with the ReLU function | Mar 31, 2025 | Data CompressionMath | CodeCode Available | 0 |