SOTAVerified

Math

Papers

Showing 201250 of 1596 papers

TitleStatusHype
Meta Prompting for AI SystemsCode2
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought ReasoningCode2
Multi-View Reasoning: Consistent Contrastive Learning for Math Word ProblemCode2
DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-SolvingCode2
Measuring Multimodal Mathematical Reasoning with MATH-Vision DatasetCode2
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning ModelsCode2
MegaMath: Pushing the Limits of Open Math CorporaCode2
Measuring Mathematical Problem Solving With the MATH DatasetCode2
An Expression Tree Decoding Strategy for Mathematical Equation GenerationCode2
Memorizing TransformersCode2
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language ModelsCode2
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to ImitateCode2
MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math DataCode2
JudgeBench: A Benchmark for Evaluating LLM-based JudgesCode2
Cumulative Reasoning with Large Language ModelsCode2
A Comparative Study on Reasoning Patterns of OpenAI's o1 ModelCode2
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual ContextsCode2
Meta-Design Matters: A Self-Design Multi-Agent SystemCode2
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem SolvingCode2
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task ArithmeticCode2
CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for Task-Aware Parameter-Efficient Fine-tuningCode2
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics BenchmarkCode2
Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics LearningCode2
MAS-Zero: Designing Multi-Agent Systems with Zero SupervisionCode2
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical CodeCode2
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from ScratchCode2
Efficient Reinforcement Finetuning via Adaptive Curriculum LearningCode2
Archon: An Architecture Search Framework for Inference-Time TechniquesCode2
AbstentionBench: Reasoning LLMs Fail on Unanswerable QuestionsCode2
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal MathematicsCode2
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning ModelsCode2
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical ReasoningCode2
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language ModelsCode2
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement LearningCode2
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMsCode1
M1: Towards Scalable Test-Time Compute with Mamba Reasoning ModelsCode1
LoRA Soups: Merging LoRAs for Practical Skill Composition TasksCode1
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo MethodsCode1
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?Code1
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy PreservationCode1
Can an AI Win Ghana's National Science and Maths Quiz? An AI Grand Challenge for EducationCode1
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement LearningCode1
LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language ModelsCode1
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical ReasoningCode1
MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language ModelsCode1
Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math IdentifiersCode1
Broken Neural Scaling LawsCode1
Let's Verify Math Questions Step by StepCode1
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic TasksCode1
Brilla AI: AI Contestant for the National Science and Maths QuizCode1
Show:102550
← PrevPage 5 of 32Next →

No leaderboard results yet.