SOTAVerified

Math

Papers

Showing 125 of 1596 papers

TitleStatusHype
QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation0
VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks0
Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training0
Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding0
Personalized Exercise Recommendation with Semantically-Grounded Knowledge TracingCode0
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data ContaminationCode1
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement LearningCode1
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs0
Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model0
CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs0
The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong GainsCode1
Activation Steering for Chain-of-Thought CompressionCode0
LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language ModelsCode1
EvoAgentX: An Automated Framework for Evolving Agentic WorkflowsCode7
Effects of structure on reasoning in instance-level Self-DiscoverCode0
Energy-Based Transformers are Scalable Learners and ThinkersCode4
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement LearningCode2
Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model0
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test0
Bridging Offline and Online Reinforcement Learning for LLMs0
Multi-lingual Functional Evaluation for Large Language Models0
AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length ControlCode0
OctoThinker: Mid-training Incentivizes Reinforcement Learning ScalingCode2
When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs0
Causal Decomposition Analysis with Synergistic Interventions: A Triply-Robust Machine Learning Approach to Addressing Multiple Dimensions of Social Disparities0
Show:102550
← PrevPage 1 of 64Next →

No leaderboard results yet.