SOTAVerified

Math

Papers

Showing 251300 of 1596 papers

TitleStatusHype
MathPhys-Guided Coarse-to-Fine Anomaly Synthesis with SQE-Driven Bi-Level Optimization for Anomaly Detection0
Rethinking the Generation of High-Quality CoT Data from the Perspective of LLM-Adaptive Question Difficulty Grading0
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation0
Reinforcement Learning from Human FeedbackCode5
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs0
Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit GenerationCode1
Heimdall: test-time scaling on the generative verification0
M1: Towards Scalable Test-Time Compute with Mamba Reasoning ModelsCode1
Efficient Process Reward Model Training via Active LearningCode1
The Jailbreak Tax: How Useful are Your Jailbreak Outputs?Code1
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free ResolutionCode3
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement LearningCode2
Dynamic Cheatsheet: Test-Time Learning with Adaptive MemoryCode3
Supervised Optimism Correction: Be Confident When LLMs Are Sure0
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for CompressionCode1
GPT Carry-On: Training Foundation Model for Customization Could Be Simple, Scalable and Affordable0
MDIT: A Model-free Data Interpolation Method for Diverse Instruction Tuning0
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language ModelsCode1
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning IncentivizationCode2
Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language ModelsCode2
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use0
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning ModelsCode2
Efficient Reinforcement Finetuning via Adaptive Curriculum LearningCode2
Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification0
Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning0
oneDAL Optimization for ARM Scalable Vector Extension: Maximizing Efficiency for High-Performance Data Science0
Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation0
Online Difficulty Filtering for Reasoning Oriented Reinforcement Learning0
Large (Vision) Language Models are Unsupervised In-Context LearnersCode1
MegaMath: Pushing the Limits of Open Math CorporaCode2
BlenderGym: Benchmarking Foundational Model Systems for Graphics EditingCode1
Cross-Lingual Consistency: A Novel Inference Framework for Advancing Reasoning in Large Language Models0
How Difficulty-Aware Staged Reinforcement Learning Enhances LLMs' Reasoning Capabilities: A Preliminary Experimental Study0
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning0
Hawkeye:Efficient Reasoning with Model Collaboration0
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics0
Investigating Large Language Models in Diagnosing Students' Cognitive Skills in Math Problem-solving0
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies AheadCode2
Entropy-Based Adaptive Weighting for Self-TrainingCode1
An extrapolated and provably convergent algorithm for nonlinear matrix decomposition with the ReLU functionCode0
DebFlow: Automating Agent Creation via Agent Debate0
ToRL: Scaling Tool-Integrated RLCode3
Learning to Reason for Long-Form Story GenerationCode2
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?Code1
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning ModelsCode2
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning ModelsCode1
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad0
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language ModelsCode0
Effective Skill Unlearning through Intervention and AbstentionCode0
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators0
Show:102550
← PrevPage 6 of 32Next →

No leaderboard results yet.