| Towards Efficient and Effective Alignment of Large Language Models | Jun 11, 2025 | Mathematical ReasoningMeta-Learning | —Unverified | 0 |
| SarcasmBench: Towards Evaluating Large Language Models on Sarcasm Understanding | Aug 21, 2024 | Logical ReasoningMathematical Reasoning | —Unverified | 0 |
| Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning | Oct 9, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems | May 21, 2025 | BenchmarkingMath | —Unverified | 0 |
| Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning | Feb 25, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Towards Tractable Mathematical Reasoning: Challenges, Strategies, and Opportunities for Solving Math Word Problems | Oct 29, 2021 | Answer GenerationMath | —Unverified | 0 |
| Towards Understanding Multi-Round Large Language Model Reasoning: Approximability, Learnability and Generalizability | Mar 5, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees | Oct 10, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Training-Free Mitigation of Language Reasoning Degradation After Multimodal Instruction Tuning | Dec 4, 2024 | GSM8KLanguage Modeling | —Unverified | 0 |
| Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning | May 21, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms | May 22, 2025 | Adversarial AttackBenchmarking | —Unverified | 0 |
| UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models | Jan 23, 2025 | Mathematical Reasoning | —Unverified | 0 |
| Uncertainty-Aware Step-wise Verification with Generative Reward Models | Feb 16, 2025 | Mathematical ReasoningUncertainty Quantification | —Unverified | 0 |
| Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap | Jan 5, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| Uni-LoRA: One Vector is All You Need | Jun 1, 2025 | AllMathematical Reasoning | —Unverified | 0 |
| Universal Self-Consistency for Large Language Model Generation | Nov 29, 2023 | Code GenerationLanguage Modeling | —Unverified | 0 |
| Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning | May 19, 2025 | 2kMathematical Reasoning | —Unverified | 0 |
| Evaluating Mathematical Reasoning Across Large Language Models: A Fine-Grained Approach | Mar 13, 2025 | Formal LogicMathematical Reasoning | —Unverified | 0 |
| VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks | Jul 17, 2025 | MathMathematical Reasoning | —Unverified | 0 |
| VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers | Oct 10, 2024 | Mathematical ReasoningQ-Learning | —Unverified | 0 |
| VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos | Jun 5, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 |
| VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning | Oct 30, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| Weaker LLMs' Opinions Also Matter: Mixture of Opinions Enhances LLM's Mathematical Reasoning | Feb 26, 2025 | GSM8KMathematical Reasoning | —Unverified | 0 |
| WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks | Jun 2, 2025 | Large Language ModelMathematical Reasoning | —Unverified | 0 |
| What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning | Dec 20, 2024 | Mathematical Reasoning | —Unverified | 0 |
| Why are NLP Models Fumbling at Elementary Math? A Survey of Automatic Word Problem Solvers | Jan 16, 2022 | MathMathematical Reasoning | —Unverified | 0 |
| Why are NLP Models Fumbling at Elementary Math? A Survey of Deep Learning based Word Problem Solvers | May 31, 2022 | MathMathematical Reasoning | —Unverified | 0 |
| WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications | May 20, 2025 | Mathematical ReasoningMultiple-choice | —Unverified | 0 |
| 1bit-Merging: Dynamic Quantized Merging for Large Language Models | Feb 15, 2025 | Code GenerationMath | —Unverified | 0 |
| HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class | May 17, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| Guided Stream of Search: Learning to Better Search with Language Models via Optimal Path Guidance | Oct 3, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| VerifiAgent: a Unified Verification Agent in Language Model Reasoning | Apr 1, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| AI-Assisted Generation of Difficult Math Questions | Jul 30, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking | Jun 1, 2025 | 4kMath | CodeCode Available | 0 |
| VerityMath: Advancing Mathematical Reasoning by Self-Verification Through Unit Consistency | Nov 13, 2023 | MathMathematical Reasoning | CodeCode Available | 0 |
| Give me a hint: Can LLMs take a hint to solve math problems? | Oct 8, 2024 | Adversarial RobustnessMath | CodeCode Available | 0 |
| Gap-Filling Prompting Enhances Code-Assisted Mathematical Reasoning | Nov 8, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| Frugal LMs Trained to Invoke Symbolic Solvers Achieve Parameter-Efficient Arithmetic Reasoning | Dec 9, 2023 | Arithmetic ReasoningMathematical Reasoning | CodeCode Available | 0 |
| Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence | Mar 26, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| FRoG: Evaluating Fuzzy Reasoning of Generalized Quantifiers in Large Language Models | Jul 1, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting | Feb 9, 2023 | Mathematical ReasoningNatural Language Inference | CodeCode Available | 0 |
| Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying | Dec 19, 2024 | MathMathematical Reasoning | CodeCode Available | 0 |
| Compositional Processing Emerges in Neural Networks Solving Math Problems | May 19, 2021 | MathMathematical Reasoning | CodeCode Available | 0 |
| SWI: Speaking with Intent in Large Language Models | Mar 27, 2025 | Mathematical ReasoningQuestion Answering | CodeCode Available | 0 |
| ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention | May 15, 2025 | Code GenerationLanguage Modeling | CodeCode Available | 0 |
| Process-based Self-Rewarding Language Models | Mar 5, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment | Nov 18, 2024 | Mathematical Reasoning | CodeCode Available | 0 |
| Agentic-R1: Distilled Dual-Strategy Reasoning | Jul 8, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| Probability-Consistent Preference Optimization for Enhanced LLM Reasoning | May 29, 2025 | Mathematical Reasoning | CodeCode Available | 0 |
| Reasoning over Uncertain Text by Generative Large Language Models | Feb 14, 2024 | Decision MakingMathematical Reasoning | CodeCode Available | 0 |