SOTAVerified

Math

Papers

Showing 150 of 1596 papers

TitleStatusHype
QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation0
VAR-MATH: Probing True Mathematical Reasoning in Large Language Models via Symbolic Multi-Instance Benchmarks0
Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training0
Personalized Exercise Recommendation with Semantically-Grounded Knowledge TracingCode0
Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding0
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data ContaminationCode1
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement LearningCode1
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs0
Squeeze the Soaked Sponge: Efficient Off-policy Reinforcement Finetuning for Large Language Model0
The Delta Learning Hypothesis: Preference Tuning on Weak Data can Yield Strong GainsCode1
CoRE: Enhancing Metacognition with Label-free Self-evaluation in LRMs0
Activation Steering for Chain-of-Thought CompressionCode0
LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language ModelsCode1
EvoAgentX: An Automated Framework for Evolving Agentic WorkflowsCode7
Effects of structure on reasoning in instance-level Self-DiscoverCode0
Energy-Based Transformers are Scalable Learners and ThinkersCode4
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement LearningCode2
Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model0
Bridging Offline and Online Reinforcement Learning for LLMs0
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test0
Multi-lingual Functional Evaluation for Large Language Models0
When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs0
AALC: Large Language Model Efficient Reasoning via Adaptive Accuracy-Length ControlCode0
OctoThinker: Mid-training Incentivizes Reinforcement Learning ScalingCode2
Causal Decomposition Analysis with Synergistic Interventions: A Triply-Robust Machine Learning Approach to Addressing Multiple Dimensions of Social Disparities0
Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models0
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMsCode0
Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics LearningCode2
Evolving Prompts In-Context: An Open-ended, Self-replicating PerspectiveCode1
Shrinking the Generation-Verification Gap with Weak Verifiers0
Leveraging LLMs to Assess Tutor Moves in Real-Life Dialogues: A Feasibility Study0
No Free Lunch: Rethinking Internal Feedback for LLM Reasoning0
OJBench: A Competition Level Code Benchmark For Large Language ModelsCode1
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System NeedCode0
Utility-Driven Speculative Decoding for Mixture-of-Experts0
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad TeamCode1
Essential-Web v1.0: 24T tokens of organized web dataCode2
SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks0
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy0
Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks0
Steering LLM Thinking with Budget GuidanceCode1
Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models0
Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models0
VGR: Visual Grounded Reasoning0
Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards0
TreeRL: LLM Reinforcement Learning with On-Policy Tree SearchCode2
Learning a Continue-Thinking Token for Enhanced Test-Time ScalingCode0
Spurious Rewards: Rethinking Training Signals in RLVRCode3
ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference OptimizationCode0
RePO: Replay-Enhanced Policy OptimizationCode1
Show:102550
← PrevPage 1 of 32Next →

No leaderboard results yet.