SOTAVerified

Math

Papers

Showing 376400 of 1596 papers

TitleStatusHype
PECC: Problem Extraction and Coding ChallengesCode1
AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code GenerationCode1
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word ProblemsCode1
Toward Self-Improvement of LLMs via Imagination, Searching, and CriticizingCode1
Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPTCode1
What is in Your Safe Data? Identifying Benign Data that Breaks SafetyCode1
Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with AutoformalizationCode1
Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT DevicesCode1
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?Code1
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language ModelsCode1
Brilla AI: AI Contestant for the National Science and Maths QuizCode1
Improving the Validity of Automatically Generated Feedback via Reinforcement LearningCode1
Case-Based or Rule-Based: How Do Transformers Do the Math?Code1
Stepwise Self-Consistent Mathematical Reasoning with Large Language ModelsCode1
MATHWELL: Generating Educational Math Word Problems Using Teacher AnnotationsCode1
ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language ModelsCode1
Language Models as Science TutorsCode1
GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-SolvingCode1
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof DataCode1
Understanding Reasoning Ability of Language Models From the Perspective of Reasoning Paths AggregationCode1
MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language ModelsCode1
ReGAL: Refactoring Programs to Discover Generalizable AbstractionsCode1
TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic TasksCode1
Over-Reasoning and Redundant Calculation of Large Language ModelsCode1
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step ReasoningCode1
Show:102550
← PrevPage 16 of 64Next →

No leaderboard results yet.