SOTAVerified

Mathematical Problem-Solving

Papers

Showing 51100 of 106 papers

TitleStatusHype
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes in Mathematical Reasoning0
FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning0
Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models0
How Do Large Language Monkeys Get Their Power (Laws)?0
Improving Small-Scale Large Language Models Function Calling for Reasoning Tasks0
JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving0
Kwai-STaR: Transform LLMs into State-Transition Reasoners0
Large Language Models for Mathematical Reasoning: Progresses and Challenges0
LearNAT: Learning NL2SQL with AST-guided Task Decomposition for Large Language Models0
Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems0
MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection0
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task0
PersonaMath: Enhancing Math Reasoning through Persona-Driven Data Augmentation0
PoLAR: Polar-Decomposed Low-Rank Adapter Representation0
Premise Order Matters in Reasoning with Large Language Models0
Reasoning Models Can Be Effective Without Thinking0
Scaling Autonomous Agents via Automatic Reward Modeling And Planning0
Scaling Laws for Autoregressive Generative Modeling0
SECURA: Sigmoid-Enhanced CUR Decomposition with Uninterrupted Retention and Low-Rank Adaptation in Large Language Models0
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models0
SMART: Self-Generating and Self-Validating Multi-Dimensional Assessment for LLMs' Mathematical Problem Solving0
STRIVE: Structured Reasoning for Self-Improvement in Claim Verification0
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving0
TeleMath: A Benchmark for Large Language Models in Telecom Mathematical Problem Solving0
The Consensus Game: Language Model Generation via Equilibrium Search0
Three Questions Concerning the Use of Large Language Models to Facilitate Mathematics Learning0
Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving0
Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH0
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems0
The Buffer Mechanism for Multi-Step Information Reasoning in Language Models0
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning0
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving0
Mixture-of-Instructions: Comprehensive Alignment of a Large Language Model through the Mixture of Diverse System Prompting Instructions0
Navigating Semantic Relations: Challenges for Language Models in Abstract Common-Sense Reasoning0
OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step0
On Vanishing Variance in Transformer Length Generalization0
Performance Comparison of Large Language Models on Advanced Calculus Problems0
Mathify: Evaluating Large Language Models on Mathematical Problem Solving TasksCode0
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical ProblemsCode0
LocationReasoner: Evaluating LLMs on Real-World Site Selection ReasoningCode0
Large Language Models for Mathematical AnalysisCode0
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?Code0
Data Contamination Through the Lens of TimeCode0
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate ClassCode0
Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code GenerationCode0
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace TheoryCode0
Exploring LLM Reasoning Through Controlled Prompt VariationsCode0
Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth AnswersCode0
Can LLMs Master Math? Investigating Large Language Models on Math Stack ExchangeCode0
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical SupervisionCode0
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.