SOTAVerified

Math

Papers

Showing 476500 of 1596 papers

TitleStatusHype
Toward Adaptive Reasoning in Large Language Models with Thought RollbackCode1
Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math IdentifiersCode1
Towards an AI to Win Ghana's National Science and Maths QuizCode1
Large Language Models Are Neurosymbolic ReasonersCode1
How to Get Your LLM to Generate Challenging Problems for EvaluationCode1
Entropy-Based Adaptive Weighting for Self-TrainingCode1
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model EvaluationCode1
How well do Large Language Models perform in Arithmetic tasks?Code1
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM ReasoningCode1
HARDMath: A Benchmark Dataset for Challenging Problems in Applied MathematicsCode1
HARP: A challenging human-annotated math reasoning benchmarkCode1
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word ProblemsCode1
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM SystemsCode1
Can an AI Win Ghana's National Science and Maths Quiz? An AI Grand Challenge for EducationCode1
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image ModelsCode1
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step ReasoningCode1
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models Aligned with Human Cognitive PrinciplesCode1
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability TreesCode1
Evaluating and Improving Tool-Augmented Computation-Intensive Math ReasoningCode1
Implicit Chain of Thought Reasoning via Knowledge DistillationCode1
Are NLP Models really able to Solve Simple Math Word Problems?Code1
Case-Based or Rule-Based: How Do Transformers Do the Math?Code1
Graph-to-Tree Neural Networks for Learning Structured Input-Output Translation with Applications to Semantic Parsing and Math Word ProblemCode1
CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language ModelsCode1
AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code GenerationCode1
Show:102550
← PrevPage 20 of 64Next →

No leaderboard results yet.