SOTAVerified

Math

Papers

Showing 12761300 of 1596 papers

TitleStatusHype
Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation0
dMath: A Scalable Linear Algebra and Math Library for Heterogeneous GP-GPU Architectures0
dMath: Distributed Linear Algebra for DL0
Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models0
Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning0
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?0
Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment?0
TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models0
Do Large Language Models Truly Grasp Mathematics? An Empirical Exploration From Cognitive Psychology0
Dolphin: A Spoken Language Proficiency Assessment System for Elementary Education0
Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition0
Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding0
Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning0
Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model0
DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students' Hand-Drawn Math Images0
Cascaded Self-Evaluation Augmented Training for Efficient Multimodal Large Language Models0
Can you hear me now? Sensitive comparisons of human and machine perception0
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces0
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models0
Can We Further Elicit Reasoning in LLMs? Critic-Guided Planning with Retrieval-Augmentation for Solving Challenging Tasks0
Testing GPT-4-o1-preview on math and science problems: A follow-up study0
Dynamic Scheduling of MPI-based Distributed Deep Learning Training Jobs0
Dynamic Skill Adaptation for Large Language Models0
Testing GPT-4 with Wolfram Alpha and Code Interpreter plug-ins on math and science problems0
EasyMath: A 0-shot Math Benchmark for SLMs0
Show:102550
← PrevPage 52 of 64Next →

No leaderboard results yet.