| WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks | Jun 2, 2025 | Large Language ModelMathematical Reasoning | —Unverified | 0 | 0 |
| What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning | Dec 20, 2024 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Why are NLP Models Fumbling at Elementary Math? A Survey of Automatic Word Problem Solvers | Jan 16, 2022 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Why are NLP Models Fumbling at Elementary Math? A Survey of Deep Learning based Word Problem Solvers | May 31, 2022 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications | May 20, 2025 | Mathematical ReasoningMultiple-choice | —Unverified | 0 | 0 |
| 1bit-Merging: Dynamic Quantized Merging for Large Language Models | Feb 15, 2025 | Code GenerationMath | —Unverified | 0 | 0 |
| You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism | Mar 3, 2024 | Machine TranslationMathematical Reasoning | —Unverified | 0 | 0 |
| MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems | Aug 3, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum | May 20, 2025 | Mathematical ReasoningReinforcement Learning (RL) | —Unverified | 0 | 0 |
| A Careful Examination of Large Language Model Performance on Grade School Arithmetic | May 1, 2024 | GSM8KLanguage Modeling | —Unverified | 0 | 0 |