Mathematical Problem-Solving

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 106 papers

Title	Date	Tasks	Status
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning	Jun 16, 2024	BenchmarkingMath	—Unverified
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning	Oct 8, 2024	GSM8KHallucination	—Unverified
Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models	Apr 9, 2025	Instruction FollowingMathematical Problem-Solving	—Unverified
How Do Large Language Monkeys Get Their Power (Laws)?	Feb 24, 2025	Language ModelingLanguage Modelling	—Unverified
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks	Oct 24, 2024	Logical ReasoningMathematical Problem-Solving	—Unverified
JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving	Jun 19, 2023	In-Context LearningLanguage Modeling	—Unverified
Kwai-STaR: Transform LLMs into State-Transition Reasoners	Nov 7, 2024	GSM8KMathematical Problem-Solving	—Unverified
Large Language Models for Mathematical Reasoning: Progresses and Challenges	Jan 31, 2024	DiversityMath	—Unverified
LearNAT: Learning NL2SQL with AST-guided Task Decomposition for Large Language Models	Apr 3, 2025	Mathematical Problem-SolvingPrompt Engineering	—Unverified
Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems	Aug 29, 2024	GSM8KLanguage Modeling	—Unverified
MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection	Mar 23, 2025	MathMathematical Problem-Solving	—Unverified
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task	Feb 17, 2025	Code CompletionGSM8K	—Unverified
PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation	Oct 2, 2024	Data AugmentationDiversity	—Unverified
PoLAR: Polar-Decomposed Low-Rank Adapter Representation	Jun 3, 2025	Mathematical Problem-SolvingRiemannian optimization	—Unverified
Premise Order Matters in Reasoning with Large Language Models	Feb 14, 2024	GSM8KMathematical Problem-Solving	—Unverified
Reasoning Models Can Be Effective Without Thinking	Apr 14, 2025	Automated Theorem ProvingMathematical Problem-Solving	—Unverified
Scaling Autonomous Agents via Automatic Reward Modeling And Planning	Feb 17, 2025	Decision MakingMathematical Problem-Solving	—Unverified
Scaling Laws for Autoregressive Generative Modeling	Oct 28, 2020	Mathematical Problem-Solving	—Unverified
SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models	Feb 25, 2025	Continual LearningGSM8K	—Unverified
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models	Mar 4, 2025	GSM8KMath	—Unverified
SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving	May 22, 2025	DiagnosticMathematical Problem-Solving	—Unverified
STRIVE: Structured Reasoning for Self-Improvement in Claim Verification	Feb 17, 2025	Claim VerificationMathematical Problem-Solving	—Unverified
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving	Feb 17, 2025	MathMathematical Problem-Solving	—Unverified
TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving	Jun 12, 2025	Logical ReasoningMathematical Problem-Solving	—Unverified
The Consensus Game: Language Model Generation via Equilibrium Search	Oct 13, 2023	Language ModelingLanguage Modelling	—Unverified
Three Questions Concerning the Use of Large Language Models to Facilitate Mathematics Learning	Oct 20, 2023	Mathematical Problem-SolvingPosition	—Unverified
Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving	Jan 28, 2025	MathMathematical Problem-Solving	—Unverified
Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH	Jan 30, 2025	Language ModelingLanguage Modelling	—Unverified
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems	May 21, 2025	BenchmarkingMath	—Unverified
The Buffer Mechanism for Multi-Step Information Reasoning in Language Models	May 24, 2024	Mathematical Problem-Solving	—Unverified
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning	Oct 30, 2024	BenchmarkingHallucination	—Unverified
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving	May 20, 2024	GSM8KMath	—Unverified
Mixture-of-Instructions: Comprehensive Alignment of a Large Language Model through the Mixture of Diverse System Prompting Instructions	Apr 29, 2024	Language ModelingLanguage Modelling	—Unverified
Navigating Semantic Relations: Challenges for Language Models in Abstract Common-Sense Reasoning	Feb 19, 2025	Common Sense ReasoningMathematical Problem-Solving	—Unverified
OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step	Jun 4, 2024	Language ModelingLanguage Modelling	—Unverified
On Vanishing Variance in Transformer Length Generalization	Apr 3, 2025	AttributeMathematical Problem-Solving	—Unverified
Performance Comparison of Large Language Models on Advanced Calculus Problems	Mar 5, 2025	MathMathematical Problem-Solving	—Unverified
Mathify: Evaluating Large Language Models on Mathematical Problem Solving Tasks	Apr 19, 2024	Mathematical Problem-Solving	CodeCode Available
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems	Mar 19, 2025	Mathematical Problem-Solving	CodeCode Available
LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning	Jun 16, 2025	Code GenerationMathematical Problem-Solving	CodeCode Available
Large Language Models for Mathematical Analysis	Dec 28, 2024	Mathematical Problem-SolvingMathematical Reasoning	CodeCode Available
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?	May 28, 2025	MathMathematical Problem-Solving	CodeCode Available
Data Contamination Through the Lens of Time	Oct 16, 2023	Mathematical Problem-Solving	CodeCode Available
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class	May 17, 2025	MathMathematical Problem-Solving	CodeCode Available
Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation	Jun 8, 2025	Code GenerationMathematical Problem-Solving	CodeCode Available
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory	Jun 18, 2024	Code GenerationMathematical Problem-Solving	CodeCode Available
Exploring LLM Reasoning Through Controlled Prompt Variations	Apr 2, 2025	GSM8KMathematical Problem-Solving	CodeCode Available
Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers	May 26, 2025	Logical ReasoningMathematical Problem-Solving	CodeCode Available
Can LLMs Master Math? Investigating Large Language Models on Math Stack Exchange	Mar 30, 2024	MathMathematical Problem-Solving	CodeCode Available
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical Supervision	May 26, 2025	HallucinationMath	CodeCode Available

Show:10 25 50

← PrevPage 2 of 3Next →

No leaderboard results yet.