SOTAVerified

Math

Papers

Showing 2650 of 1596 papers

TitleStatusHype
Plan for Speed -- Dilated Scheduling for Masked Diffusion Language Models0
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMsCode0
Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics LearningCode2
Evolving Prompts In-Context: An Open-ended, Self-replicating PerspectiveCode1
Shrinking the Generation-Verification Gap with Weak Verifiers0
Leveraging LLMs to Assess Tutor Moves in Real-Life Dialogues: A Feasibility Study0
No Free Lunch: Rethinking Internal Feedback for LLM Reasoning0
OJBench: A Competition Level Code Benchmark For Large Language ModelsCode1
AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System NeedCode0
Utility-Driven Speculative Decoding for Mixture-of-Experts0
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad TeamCode1
Essential-Web v1.0: 24T tokens of organized web dataCode2
SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks0
AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy0
Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks0
Steering LLM Thinking with Budget GuidanceCode1
Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models0
Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models0
VGR: Visual Grounded Reasoning0
Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards0
TreeRL: LLM Reinforcement Learning with On-Policy Tree SearchCode2
Learning a Continue-Thinking Token for Enhanced Test-Time ScalingCode0
Spurious Rewards: Rethinking Training Signals in RLVRCode3
ReCUT: Balancing Reasoning Length and Accuracy in LLMs via Stepwise Trails and Preference OptimizationCode0
RePO: Replay-Enhanced Policy OptimizationCode1
Show:102550
← PrevPage 2 of 64Next →

No leaderboard results yet.