SOTAVerified

Mathematical Problem-Solving

Papers

Showing 76100 of 106 papers

TitleStatusHype
Three Questions Concerning the Use of Large Language Models to Facilitate Mathematics Learning0
Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving0
Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH0
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems0
The Buffer Mechanism for Multi-Step Information Reasoning in Language Models0
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning0
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving0
Mixture-of-Instructions: Comprehensive Alignment of a Large Language Model through the Mixture of Diverse System Prompting Instructions0
Navigating Semantic Relations: Challenges for Language Models in Abstract Common-Sense Reasoning0
OccamLLM: Fast and Exact Language Model Arithmetic in a Single Step0
On Vanishing Variance in Transformer Length Generalization0
Performance Comparison of Large Language Models on Advanced Calculus Problems0
Mathify: Evaluating Large Language Models on Mathematical Problem Solving TasksCode0
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical ProblemsCode0
LocationReasoner: Evaluating LLMs on Real-World Site Selection ReasoningCode0
Large Language Models for Mathematical AnalysisCode0
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?Code0
Data Contamination Through the Lens of TimeCode0
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate ClassCode0
Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code GenerationCode0
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace TheoryCode0
Exploring LLM Reasoning Through Controlled Prompt VariationsCode0
Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth AnswersCode0
Can LLMs Master Math? Investigating Large Language Models on Math Stack ExchangeCode0
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical SupervisionCode0
Show:102550
← PrevPage 4 of 5Next →

No leaderboard results yet.