| Towards Efficient and Effective Alignment of Large Language Models | Jun 11, 2025 | Mathematical ReasoningMeta-Learning | —Unverified | 0 |
| SarcasmBench: Towards Evaluating Large Language Models on Sarcasm Understanding | Aug 21, 2024 | Logical ReasoningMathematical Reasoning | —Unverified | 0 |
| Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning | Oct 9, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems | May 21, 2025 | BenchmarkingMath | —Unverified | 0 |
| Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning | Feb 25, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Towards Tractable Mathematical Reasoning: Challenges, Strategies, and Opportunities for Solving Math Word Problems | Oct 29, 2021 | Answer GenerationMath | —Unverified | 0 |
| Towards Understanding Multi-Round Large Language Model Reasoning: Approximability, Learnability and Generalizability | Mar 5, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees | Oct 10, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning | Dec 4, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning | May 21, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms | May 22, 2025 | Adversarial AttackBenchmarking | —Unverified | 0 |
| UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models | Jan 23, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Uncertainty-Aware Step-wise Verification with Generative Reward Models | Feb 16, 2025 | Mathematical ReasoningUncertainty Quantification | —Unverified | 0 |
| Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap | Jan 5, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Uni-LoRA: One Vector is All You Need | Jun 1, 2025 | AllMathematical Reasoning | —Unverified | 0 |
| Universal Self-Consistency for Large Language Model Generation | Nov 29, 2023 | Code GenerationLanguage Modeling | —Unverified | 0 |
| Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning | May 19, 2025 | 2kMathematical Reasoning | —Unverified | 0 |
| Evaluating Mathematical Reasoning Across Large Language Models: A Fine-Grained Approach | Mar 13, 2025 | Formal LogicMathematical Reasoning | —Unverified | 0 |
| VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks | Jul 17, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers | Oct 10, 2024 | Mathematical ReasoningQ-Learning | —Unverified | 0 |
| VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos | Jun 5, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning | Oct 30, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning | Feb 26, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks | Jun 2, 2025 | Large Language ModelMathematical Reasoning | —Unverified | 0 |
| What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning | Dec 20, 2024 | Mathematical Reasoning | —Unverified | 0 |