SOTAVerified

Math

Papers

Showing 651675 of 1596 papers

TitleStatusHype
Generative Discovery of Partial Differential Equations by Learning from Math Handbooks0
Scalable LLM Math Reasoning Acceleration with Low-rank Distillation0
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers0
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning0
A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law0
Generating Narrated Lecture Videos from Slides with Synchronized Highlights0
LookAlike: Consistent Distractor Generation in Math MCQs0
TutorGym: A Testbed for Evaluating AI Agents as Tutors and StudentsCode0
AdaptMI: Adaptive Skill-based In-context Math Instruction for Small Language Models0
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math0
Phi-4-reasoning Technical Report0
LLMs Do Not Have Human-Like Working Memory0
Local Prompt Optimization0
Trace-of-Thought Prompting: Investigating Prompt-Based Knowledge Distillation Through Question Decomposition0
Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets0
APE-Bench I: Towards File-level Automated Proof Engineering of Formal Math Libraries0
Evaluating Grounded Reasoning by Code-Assisted Large Language Models for Mathematics0
Training Large Language Models to Reason via EM Policy Gradient0
SplitReason: Learning To Offload Reasoning0
DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models0
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling EvaluatorsCode0
LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception0
OTC: Optimal Tool Calls via Reinforcement Learning0
Enhancing Math Learning in an LMS Using AI-Driven Question Recommendations0
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?0
Show:102550
← PrevPage 27 of 64Next →

No leaderboard results yet.