SOTAVerified

Math

Papers

Showing 51100 of 1596 papers

TitleStatusHype
Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math ReasoningCode2
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMsCode1
Resa: Transparent Reasoning Models via SAEsCode1
TTT-Bench: A Benchmark for Evaluating Reasoning Ability with Simple and Novel Tic-Tac-Toe-style Games0
LeanTutor: A Formally-Verified AI Tutor for Mathematical Proofs0
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM ReasoningCode1
Learning to Reason Across Parallel Samples for LLM Reasoning0
Reinforce LLM Reasoning through Multi-Agent Reflection0
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search0
AbstentionBench: Reasoning LLMs Fail on Unanswerable QuestionsCode2
Play to Generalize: Learning to Reason Through Game PlayCode2
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement LearningCode1
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation0
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity0
AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical SearchCode0
SPARQ: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms0
Spectral DerivativesCode0
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models0
Simulating LLM-to-LLM Tutoring for Multilingual Math Feedback0
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought ReasoningCode2
Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic ReasoningCode0
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers0
TreeRPO: Tree Relative Policy OptimizationCode0
Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning0
Guided Speculative Inference for Efficient Test-Time Alignment of LLMsCode0
Rectified Sparse Attention0
OpenThoughts: Data Recipes for Reasoning ModelsCode7
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image ModelsCode1
MASTER: Enhancing Large Language Model via Multi-Agent Simulated Teaching0
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem0
Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains0
Invariance Makes LLM Unlearning Resilient Even to Unanticipated Downstream Fine-TuningCode0
The Surprising Effectiveness of Negative Reinforcement in LLM ReasoningCode2
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis0
STORM-BORN: A Challenging Mathematical Derivations Dataset Curated via a Human-in-the-Loop Multi-Agent FrameworkCode1
GThinker: Towards General Multimodal Reasoning via Cue-Guided RethinkingCode0
SiLVR: A Simple Language-based Video Reasoning FrameworkCode1
Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic TasksCode1
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language ModelsCode0
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning0
A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource SettingsCode1
Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking0
Harnessing Negative Signals: Reinforcement Distillation from Teacher Data for LLM ReasoningCode1
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language ReasoningCode7
Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability0
Discriminative Policy Optimization for Token-Level Reward ModelsCode0
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation0
PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics0
Matryoshka Model Learning for Improved Elastic Student Models0
Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models0
Show:102550
← PrevPage 2 of 32Next →

No leaderboard results yet.