SOTAVerified

Math

Papers

Showing 226250 of 1596 papers

TitleStatusHype
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language ModelsCode2
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical CodeCode2
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning ModelsCode2
Archon: An Architecture Search Framework for Inference-Time TechniquesCode2
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPOCode2
AbstentionBench: Reasoning LLMs Fail on Unanswerable QuestionsCode2
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit AssignmentCode2
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical ReasoningCode2
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language ModelsCode2
Collective Constitutional AI: Aligning a Language Model with Public InputCode1
M1: Towards Scalable Test-Time Compute with Mamba Reasoning ModelsCode1
LoRA Soups: Merging LoRAs for Practical Skill Composition TasksCode1
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?Code1
CoMAT: Chain of Mathematically Annotated Thought Improves Mathematical ReasoningCode1
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMsCode1
LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language ModelsCode1
Can an AI Win Ghana's National Science and Maths Quiz? An AI Grand Challenge for EducationCode1
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement LearningCode1
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy PreservationCode1
MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language ModelsCode1
Building Dataset for Grounding of Formulae — Annotating Coreference Relations Among Math IdentifiersCode1
Broken Neural Scaling LawsCode1
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo MethodsCode1
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic TasksCode1
Brilla AI: AI Contestant for the National Science and Maths QuizCode1
Show:102550
← PrevPage 10 of 64Next →

No leaderboard results yet.