| Taming Generative Diffusion Prior for Universal Blind Image Restoration | Aug 21, 2024 | Image RestorationMathematical Reasoning | —Unverified | 0 | 0 |
| Tangram: Benchmark for Evaluating Geometric Element Recognition in Large Multimodal Models | Aug 25, 2024 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving | Feb 17, 2025 | MathMathematical Problem-Solving | —Unverified | 0 | 0 |
| TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving | Jun 12, 2025 | Logical ReasoningMathematical Problem-Solving | —Unverified | 0 | 0 |
| Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic | Jun 9, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Test-time Scaling Techniques in Theoretical Physics -- A Comparison of Methods on the TPBench Dataset | Jun 25, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Text Generation Beyond Discrete Token Sampling | May 20, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 | 0 |
| The Axiom-Based Atlas: A Structural Mapping of Theorems via Foundational Proof Vectors | Mar 31, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| The Karp Dataset | Jan 24, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 | 0 |
| The Lessons of Developing Process Reward Models in Mathematical Reasoning | Jan 13, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |