SOTAVerified

Math

Papers

Showing 351400 of 1596 papers

TitleStatusHype
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models0
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts0
MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model TrainingCode0
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle SolvingCode1
Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning0
Self-Training Elicits Concise Reasoning in Large Language ModelsCode1
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?Code1
Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks AutomationCode2
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning0
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution0
From Euler to AI: Unifying Formulas for Mathematical ConstantsCode0
Learning Decentralized Swarms Using Rotation Equivariant Graph Neural NetworksCode0
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language ModelsCode2
Linguistic Generalizability of Test-Time Scaling in Mathematical ReasoningCode0
Reasoning with Latent Thoughts: On the Power of Looped Transformers0
DISC: DISC: Dynamic Decomposition Improves LLM Inference Scaling0
SBSC: Step-By-Step Coding for Improving Mathematical Olympiad Performance0
Inference Computation Scaling for Feature Augmentation in Recommendation Systems0
Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning0
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not LongerCode0
Forgotten Polygons: Multimodal Large Language Models are Shape-BlindCode1
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement LearningCode7
S*: Test Time Scaling for Code GenerationCode7
GATE: Graph-based Adaptive Tool Evolution Across Diverse TasksCode0
How to Get Your LLM to Generate Challenging Problems for EvaluationCode1
CER: Confidence Enhanced Reasoning in LLMsCode0
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay PerspectiveCode0
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics0
SIFT: Grounding LLM Reasoning in Contexts via StickersCode2
BeamLoRA: Beam-Constraint Low-Rank Adaptation0
DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation0
The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding?0
TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination EvaluationCode0
Reasoning with Reinforced Functional Token TuningCode1
Lean-ing on Quality: How High-Quality Data Beats Diverse Multilingual Data in AutoFormalization0
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees0
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks0
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement LearningCode2
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions0
Thinking Outside the (Gray) Box: A Context-Based Score for Assessing Value and Originality in Neural Text Generation0
Thinking Preference OptimizationCode1
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task0
Scaling Test-Time Compute Without Verification or RL is Suboptimal0
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving0
Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption0
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding0
A Study on Leveraging Search and Self-Feedback for Agent Reasoning0
Warmup-Distill: Bridge the Distribution Mismatch between Teacher and Student before Knowledge DistillationCode0
Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models0
Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQLCode1
Show:102550
← PrevPage 8 of 32Next →

No leaderboard results yet.