| WebChoreArena: Evaluating Web Browsing Agents on Realistic Tedious Web Tasks | Jun 2, 2025 | Large Language ModelMathematical Reasoning | —Unverified | 0 | 0 |
| What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning | Dec 20, 2024 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Why are NLP Models Fumbling at Elementary Math? A Survey of Automatic Word Problem Solvers | Jan 16, 2022 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Why are NLP Models Fumbling at Elementary Math? A Survey of Deep Learning based Word Problem Solvers | May 31, 2022 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications | May 20, 2025 | Mathematical ReasoningMultiple-choice | —Unverified | 0 | 0 |
| 1bit-Merging: Dynamic Quantized Merging for Large Language Models | Feb 15, 2025 | Code GenerationMath | —Unverified | 0 | 0 |
| You Need to Pay Better Attention: Rethinking the Mathematics of Attention Mechanism | Mar 3, 2024 | Machine TranslationMathematical Reasoning | —Unverified | 0 | 0 |
| MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems | Aug 3, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum | May 20, 2025 | Mathematical ReasoningReinforcement Learning (RL) | —Unverified | 0 | 0 |
| A Careful Examination of Large Language Model Performance on Grade School Arithmetic | May 1, 2024 | GSM8KLanguage Modeling | —Unverified | 0 | 0 |
| Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets | Apr 28, 2025 | Data AugmentationDiversity | —Unverified | 0 | 0 |
| A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting | Jul 16, 2024 | Mathematical ReasoningQuestion Answering | —Unverified | 0 | 0 |
| ActiveDPO: Active Direct Preference Optimization for Sample-Efficient Alignment | May 25, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 | 0 |
| AdapThink: Adaptive Thinking Preferences for Reasoning Language Model | Jun 23, 2025 | DiversityLanguage Modeling | —Unverified | 0 | 0 |
| AdaSwitch: Adaptive Switching between Small and Large Agents for Effective Cloud-Local Collaborative Learning | Oct 17, 2024 | Mathematical ReasoningQuestion Answering | —Unverified | 0 | 0 |
| Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages | Jan 23, 2025 | Instruction FollowingMath | —Unverified | 0 | 0 |
| Adventures in Mathematical Reasoning | Aug 20, 2020 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Agent-as-a-Service based on Agent Network | May 13, 2025 | Code GenerationMathematical Reasoning | —Unverified | 0 | 0 |
| Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning | Apr 28, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| A Graph-Based Synthetic Data Pipeline for Scaling High-Quality Reasoning Instructions | Dec 12, 2024 | GSM8KKnowledge Graphs | —Unverified | 0 | 0 |
| AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models | May 25, 2025 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Aligning Tutor Discourse Supporting Rigorous Thinking with Tutee Content Mastery for Predicting Math Achievement | May 10, 2024 | MathMathematical Reasoning | —Unverified | 0 | 0 |
| Amplify Adjacent Token Differences: Enhancing Long Chain-of-Thought Reasoning with Shift-FFN | May 22, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Anomaly Detection of Tabular Data Using LLMs | Jun 24, 2024 | Anomaly DetectionLong-Context Understanding | —Unverified | 0 | 0 |
| Applications of Positive Unlabeled (PU) and Negative Unlabeled (NU) Learning in Cybersecurity | Dec 9, 2024 | Intrusion DetectionMalware Detection | —Unverified | 0 | 0 |
| Applying RLAIF for Code Generation with API-usage in Lightweight LLMs | Jun 28, 2024 | Code GenerationHallucination | —Unverified | 0 | 0 |
| Apriori Knowledge in an Era of Computational Opacity: The Role of AI in Mathematical Discovery | Mar 15, 2024 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Are Large Language Models Robust in Understanding Code Against Semantics-Preserving Mutations? | May 15, 2025 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Assessing GPT4-V on Structured Reasoning Tasks | Dec 13, 2023 | Code GenerationLanguage Modeling | —Unverified | 0 | 0 |
| Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering | Feb 17, 2024 | Arithmetic ReasoningMathematical Reasoning | —Unverified | 0 | 0 |
| Assessing Robustness to Spurious Correlations in Post-Training Language Models | May 9, 2025 | Instruction FollowingMathematical Reasoning | —Unverified | 0 | 0 |
| Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models | Jun 5, 2024 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Assessing the Impact of Prompting Methods on ChatGPT's Mathematical Capabilities | Dec 22, 2023 | ChatbotGSM8K | —Unverified | 0 | 0 |
| A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges | Dec 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| A Survey on Large Language Models for Mathematical Reasoning | Jun 10, 2025 | Answer GenerationMathematical Reasoning | —Unverified | 0 | 0 |
| A Symbolic Framework for Evaluating Mathematical Reasoning and Generalisation with Transformers | May 21, 2023 | Mathematical Reasoning | —Unverified | 0 | 0 |
| A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks | May 16, 2024 | Code GenerationDialogue Generation | —Unverified | 0 | 0 |
| A Systematic Survey on Large Language Models for Algorithm Design | Oct 11, 2024 | Mathematical Reasoningscientific discovery | —Unverified | 0 | 0 |
| A Technical Study into Small Reasoning Language Models | Jun 16, 2025 | Code GenerationComputational Efficiency | —Unverified | 0 | 0 |
| Augmenting In-Context-Learning in LLMs via Automatic Data Labeling and Refinement | Oct 14, 2024 | In-Context LearningMathematical Reasoning | —Unverified | 0 | 0 |
| AutoGeo: Automating Geometric Image Dataset Creation for Enhanced Geometry Understanding | Aug 28, 2024 | Mathematical Reasoning | —Unverified | 0 | 0 |
| AutoGPS: Automated Geometry Problem Solving via Multimodal Formalization and Deductive Reasoning | May 29, 2025 | Geometry Problem SolvingMathematical Reasoning | —Unverified | 0 | 0 |
| AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database | May 19, 2025 | Data AugmentationIn-Context Learning | —Unverified | 0 | 0 |
| Forward-Backward Reasoning in Large Language Models for Mathematical Verification | Aug 15, 2023 | Mathematical Reasoning | —Unverified | 0 | 0 |
| Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications | May 24, 2024 | Code GenerationLow-rank compression | —Unverified | 0 | 0 |
| Benchmarking Large Language Models via Random Variables | Jan 20, 2025 | BenchmarkingMathematical Reasoning | —Unverified | 0 | 0 |
| Benchmarking Large Language Models with Integer Sequence Generation Tasks | Nov 7, 2024 | BenchmarkingComputational Efficiency | —Unverified | 0 | 0 |
| Better Process Supervision with Bi-directional Rewarding Signals | Mar 6, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning | Jun 5, 2025 | Mathematical ReasoningProblem Decomposition | —Unverified | 0 | 0 |
| Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning | Oct 8, 2024 | Image RetrievalMath | —Unverified | 0 | 0 |