SOTAVerified

Math

Papers

Showing 551600 of 1596 papers

TitleStatusHype
Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation0
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity0
AgentSwift: Efficient LLM Agent Design via Value-guided Hierarchical SearchCode0
Spectral DerivativesCode0
SPARQ: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms0
Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning0
Automatic Robustness Stress Testing of LLMs as Mathematical Problem Solvers0
Simulating LLM-to-LLM Tutoring for Multilingual Math Feedback0
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models0
TreeRPO: Tree Relative Policy OptimizationCode0
Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic ReasoningCode0
Guided Speculative Inference for Efficient Test-Time Alignment of LLMsCode0
Rectified Sparse Attention0
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem0
MASTER: Enhancing Large Language Model via Multi-Agent Simulated Teaching0
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis0
Invariance Makes LLM Unlearning Resilient Even to Unanticipated Downstream Fine-TuningCode0
Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains0
GThinker: Towards General Multimodal Reasoning via Cue-Guided RethinkingCode0
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language ModelsCode0
Accelerated Sampling from Masked Diffusion Models via Entropy Bounded Unmasking0
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning0
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation0
Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability0
Discriminative Policy Optimization for Token-Level Reward ModelsCode0
DINGO: Constrained Inference for Diffusion LLMs0
LLM Performance for Code Generation on Noisy TasksCode0
PBEBench: A Multi-Step Programming by Examples Reasoning Benchmark inspired by Historical Linguistics0
Matryoshka Model Learning for Improved Elastic Student Models0
Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models0
Decomposing Elements of Problem Solving: What "Math" Does RL Teach?Code0
ASyMOB: Algebraic Symbolic Mathematical Operations BenchmarkCode0
Maximizing Confidence Alone Improves Reasoning0
Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning0
Error Typing for Smarter Rewards: Improving Process Reward Models with Error-Aware Hierarchical SupervisionCode0
Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions0
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles0
Inference-time Alignment in Continuous SpaceCode0
Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition0
Hard Negative Contrastive Learning for Fine-Grained Geometric Understanding in Large Multimodal ModelsCode0
Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning0
Improving Multilingual Math Reasoning for African Languages0
The Role of Diversity in In-Context Learning for Large Language Models0
Interleaved Reasoning for Large Language Models via Reinforcement Learning0
Faster and Better LLMs via Latency-Aware Test-Time Scaling0
AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models0
MMATH: A Multilingual Benchmark for Mathematical ReasoningCode0
Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math CompetitionsCode0
Steering LLM Reasoning Through Bias-Only Adaptation0
Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment?0
Show:102550
← PrevPage 12 of 32Next →

No leaderboard results yet.