SOTAVerified

Math

Papers

Showing 201225 of 1596 papers

TitleStatusHype
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual ContextsCode2
RM-R1: Reward Modeling as ReasoningCode2
MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics BenchmarkCode2
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical CodeCode2
Exploring the Compositional Deficiency of Large Language Models in Mathematical ReasoningCode2
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning ModelsCode2
Full Page Handwriting Recognition via Image to Sequence ExtractionCode2
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical ReasoningCode2
MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical ProblemsCode2
Agent Lumos: Unified and Modular Training for Open-Source Language AgentsCode2
Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language ModelsCode2
Language Models are Multilingual Chain-of-Thought ReasonersCode2
MAmmoTH: Building Math Generalist Models through Hybrid Instruction TuningCode2
Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task ArithmeticCode2
Evaluating Mathematical Reasoning Beyond AccuracyCode2
Can AI Assistants Know What They Don't Know?Code2
A Comparative Study on Reasoning Patterns of OpenAI's o1 ModelCode2
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of ParametersCode2
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal ModelsCode2
Essential-Web v1.0: 24T tokens of organized web dataCode2
Dynamic Early Exit in Reasoning ModelsCode2
Balancing LoRA Performance and Efficiency with Simple Shard SharingCode2
LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-TrainingCode2
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning GapCode2
Learning to Reason for Long-Form Story GenerationCode2
Show:102550
← PrevPage 9 of 64Next →

No leaderboard results yet.