SOTAVerified

Math

Papers

Showing 201250 of 1596 papers

TitleStatusHype
A Survey of Deep Learning for Mathematical ReasoningCode2
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought ReasoningCode2
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique PipelineCode2
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision ModelsCode2
Meta Prompting for AI SystemsCode2
MM-Vet: Evaluating Large Multimodal Models for Integrated CapabilitiesCode2
Offline Reinforcement Learning for LLM Multi-Step ReasoningCode2
Measuring Multimodal Mathematical Reasoning with MATH-Vision DatasetCode2
An Expression Tree Decoding Strategy for Mathematical Equation GenerationCode2
MegaMath: Pushing the Limits of Open Math CorporaCode2
Full Page Handwriting Recognition via Image to Sequence ExtractionCode2
Measuring Mathematical Problem Solving With the MATH DatasetCode2
Memorizing TransformersCode2
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math DataCode2
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual ContextsCode2
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning GapCode2
A Comparative Study on Reasoning Patterns of OpenAI's o1 ModelCode2
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language ModelsCode2
Meta-Design Matters: A Self-Design Multi-Agent SystemCode2
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics BenchmarkCode2
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical CodeCode2
Balancing LoRA Performance and Efficiency with Simple Shard SharingCode2
Flaming-hot Initiation with Regular Execution Sampling for Large Language ModelsCode2
FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning ModelsCode2
MAS-Zero: Designing Multi-Agent Systems with Zero SupervisionCode2
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical ReasoningCode2
Archon: An Architecture Search Framework for Inference-Time TechniquesCode2
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function OptimizationCode2
Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic CorpusCode2
AbstentionBench: Reasoning LLMs Fail on Unanswerable QuestionsCode2
Exploring the Limit of Outcome Reward for Learning Mathematical ReasoningCode2
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language ModelsCode2
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language ModelsCode2
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement LearningCode2
Evolving Prompts In-Context: An Open-ended, Self-replicating PerspectiveCode1
EXAONE Deep: Reasoning Enhanced Language ModelsCode1
M1: Towards Scalable Test-Time Compute with Mamba Reasoning ModelsCode1
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMsCode1
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo MethodsCode1
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?Code1
Mathfish: Evaluating Language Model Math Reasoning via Grounding in Educational CurriculaCode1
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy PreservationCode1
Can an AI Win Ghana's National Science and Maths Quiz? An AI Grand Challenge for EducationCode1
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement LearningCode1
LoRA Soups: Merging LoRAs for Practical Skill Composition TasksCode1
LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language ModelsCode1
Evaluating and Improving Tool-Augmented Computation-Intensive Math ReasoningCode1
MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language ModelsCode1
Explaining Datasets in Words: Statistical Models with Natural Language ParametersCode1
Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math IdentifiersCode1
Show:102550
← PrevPage 5 of 32Next →

No leaderboard results yet.