SOTAVerified

Math

Papers

Showing 851875 of 1596 papers

TitleStatusHype
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual ContextCode1
MAmmoTH2: Scaling Instructions from the Web0
Exploring the Compositional Deficiency of Large Language Models in Mathematical ReasoningCode2
Assessing and Verifying Task Utility in LLM-Powered Applications0
Math Multiple Choice Question Generation via Human-Large Language Model Collaboration0
GOLD: Geometry Problem Solver with Natural Language DescriptionCode1
A Careful Examination of Large Language Model Performance on Grade School Arithmetic0
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference LearningCode3
Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models0
Iterative Reasoning Preference Optimization0
PECC: Problem Extraction and Coding ChallengesCode1
Small Language Models Need Strong Verifiers to Self-Correct ReasoningCode0
LayerSkip: Enabling Early Exit Inference and Self-Speculative DecodingCode3
AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code GenerationCode1
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word ProblemsCode1
Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training0
MARIO Eval: Evaluate Your Math LLM with your Math LLM--A mathematical dataset evaluation toolkitCode5
PARAMANU-GANITA: Language Model with Mathematical Capabilities0
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone0
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank0
Toward Self-Improvement of LLMs via Imagination, Searching, and CriticizingCode1
On the Empirical Complexity of Reasoning and Planning in LLMs0
Self-Explore: Enhancing Mathematical Reasoning in Language Models with Fine-grained RewardsCode2
Mental Stress Detection: Development and Evaluation of a Wearable In-Ear Plethysmography0
Rho-1: Not All Tokens Are What You NeedCode3
Show:102550
← PrevPage 35 of 64Next →

No leaderboard results yet.