SOTAVerified

Math

Papers

Showing 301325 of 1596 papers

TitleStatusHype
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy PreservationCode1
Automatic Generation of Socratic Subquestions for Teaching Math Word ProblemsCode1
M1: Towards Scalable Test-Time Compute with Mamba Reasoning ModelsCode1
Let's Verify Math Questions Step by StepCode1
Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant EvaluationCode1
A Diverse Corpus for Evaluating and Developing English Math Word Problem SolversCode1
LEVER: Learning to Verify Language-to-Code Generation with ExecutionCode1
Learning Goal-Conditioned Representations for Language Reward ModelsCode1
Learning Multi-Step Reasoning by Solving Arithmetic TasksCode1
Decoupled Visual Interpretation and Linguistic Reasoning for Math Problem SolvingCode1
Learning Math Reasoning from Self-Sampled Correct and Partially-Correct SolutionsCode1
Learning to Reason Deductively: Math Word Problem Solving as Complex Relation ExtractionCode1
DataEnvGym: Data Generation Agents in Teacher Environments with Student FeedbackCode1
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed BanditsCode1
Autoformalize Mathematical Statements by Symbolic Equivalence and Semantic ConsistencyCode1
Large Language Models Can Be Easily Distracted by Irrelevant ContextCode1
AutoBencher: Creating Salient, Novel, Difficult Datasets for Language ModelsCode1
Augmenting Math Word Problems via Iterative Question ComposingCode1
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual DependencyCode1
Large (Vision) Language Models are Unsupervised In-Context LearnersCode1
Learning by Fixing: Solving Math Word Problems with Weak SupervisionCode1
Language Models as Science TutorsCode1
Language Models Encode the Value of Numbers LinearlyCode1
A Tree-Structured Decoder for Image-to-Markup GenerationCode1
Non-myopic Generation of Language Models for Reasoning and PlanningCode1
Show:102550
← PrevPage 13 of 64Next →

No leaderboard results yet.