SOTAVerified

Math

Papers

Showing 5175 of 1596 papers

TitleStatusHype
Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math ReasoningCode2
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMsCode1
Resa: Transparent Reasoning Models via SAEsCode1
TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games0
LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs0
Learning to Reason Across Parallel Samples for LLM Reasoning0
Reinforce LLM Reasoning through Multi-Agent Reflection0
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM ReasoningCode1
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search0
AbstentionBench: Reasoning LLMs Fail on Unanswerable QuestionsCode2
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement LearningCode1
Play to Generalize: Learning to Reason Through Game PlayCode2
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation0
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity0
AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical SearchCode0
SPARQ: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms0
Spectral DerivativesCode0
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models0
Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic ReasoningCode0
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers0
Simulating LLM-to-LLM Tutoring for Multilingual Math Feedback0
TreeRPO: Tree Relative Policy OptimizationCode0
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought ReasoningCode2
Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning0
Guided Speculative Inference for Efficient Test-Time Alignment of LLMsCode0
Show:102550
← PrevPage 3 of 64Next →

No leaderboard results yet.