SOTAVerified

Math

Papers

Showing 601625 of 1596 papers

TitleStatusHype
Big Math and the One-Brain Barrier A Position Paper and Architecture Proposal0
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models0
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces0
Accurate closed-form solution of the SIR epidemic model0
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning0
Biased Programmers? Or Biased Data? A Field Experiment in Operationalizing AI Ethics0
DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math Images0
Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model0
An Improved Coarse-to-Fine Method for Solving Generation Tasks0
A General Retrieval-Augmented Generation Framework for Multimodal Case-Based Reasoning Applications0
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation0
Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition0
Dolphin: A Spoken Language Proficiency Assessment System for Elementary Education0
Beyond Sentential Semantic Parsing: Tackling the Math SAT with a Cascade of Tree Transducers0
Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology0
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models0
Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment?0
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?0
Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets0
Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning0
Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models0
Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning0
LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs0
Iterative Reasoning Preference Optimization0
Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains0
Show:102550
← PrevPage 25 of 64Next →

No leaderboard results yet.