| WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks | Jun 2, 2025 | Large Language ModelMathematical Reasoning | —Unverified | 0 | 0 |
| What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning | Dec 20, 2024 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Why are NLP Models Fumbling at Elementary Math? A Survey of Automatic Word Problem Solvers | Jan 16, 2022 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Why are NLP Models Fumbling at Elementary Math? A Survey of Deep Learning based Word Problem Solvers | May 31, 2022 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications | May 20, 2025 | Mathematical ReasoningMultiple-choice | —Unverified | 0 | 0 |
| 1bit-Merging: Dynamic Quantized Merging for Large Language Models | Feb 15, 2025 | Code GenerationMath | —Unverified | 0 | 0 |
| You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism | Mar 3, 2024 | Machine TranslationMathematical Reasoning | —Unverified | 0 | 0 |
| MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems | Aug 3, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum | May 20, 2025 | Mathematical ReasoningReinforcement Learning (RL) | —Unverified | 0 | 0 |
| A Careful Examination of Large Language Model Performance on Grade School Arithmetic | May 1, 2024 | GSM8KLanguage Modeling | —Unverified | 0 | 0 |
| Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets | Apr 28, 2025 | Data AugmentationDiversity | —Unverified | 0 | 0 |
| A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting | Jul 16, 2024 | Mathematical ReasoningQuestion Answering | —Unverified | 0 | 0 |
| ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment | May 25, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 | 0 |
| AdapThink: Adaptive Thinking Preferences for Reasoning Language Model | Jun 23, 2025 | DiversityLanguage Modeling | —Unverified | 0 | 0 |
| AdaSwitch: Adaptive Switching between Small and Large Agents for Effective Cloud-Local Collaborative Learning | Oct 17, 2024 | Mathematical ReasoningQuestion Answering | —Unverified | 0 | 0 |
| Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages | Jan 23, 2025 | Instruction FollowingMath | —Unverified | 0 | 0 |
| Adventures in Mathematical Reasoning | Aug 20, 2020 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Agent-as-a-Service based on Agent Network | May 13, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 | 0 |
| Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning | Apr 28, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions | Dec 12, 2024 | GSM8KKnowledge Graphs | —Unverified | 0 | 0 |
| AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models | May 25, 2025 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Aligning Tutor Discourse Supporting Rigorous Thinking with Tutee Content Mastery for Predicting Math Achievement | May 10, 2024 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Amplify Adjacent Token Differences: Enhancing Long Chain-of-Thought Reasoning with Shift-FFN | May 22, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Anomaly Detection of Tabular Data Using LLMs | Jun 24, 2024 | Anomaly DetectionLong-Context Understanding | —Unverified | 0 | 0 |
| Applications of Positive Unlabeled (PU) and Negative Unlabeled (NU) Learning in Cybersecurity | Dec 9, 2024 | Intrusion DetectionMalware Detection | —Unverified | 0 | 0 |