SOTAVerified

Math

Papers

Showing 751800 of 1596 papers

TitleStatusHype
From Euler to AI: Unifying Formulas for Mathematical ConstantsCode0
SBSC: Step-By-Step Coding for Improving Mathematical Olympiad Performance0
DISC: DISC: Dynamic Decomposition Improves LLM Inference Scaling0
Inference Computation Scaling for Feature Augmentation in Recommendation Systems0
Does Reasoning Introduce Bias? A Study of Social Bias Evaluation and Mitigation in LLM Reasoning0
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not LongerCode0
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay PerspectiveCode0
CER: Confidence Enhanced Reasoning in LLMsCode0
GATE: Graph-based Adaptive Tool Evolution Across Diverse TasksCode0
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics0
BeamLoRA: Beam-Constraint Low-Rank Adaptation0
DiffSampling: Enhancing Diversity and Accuracy in Neural Text Generation0
The Self-Improvement Paradox: Can Language Models Bootstrap Reasoning Capabilities without External Scaffolding?0
TreeCut: A Synthetic Unanswerable Math Word Problem Dataset for LLM Hallucination EvaluationCode0
None of the Others: a General Technique to Distinguish Reasoning from Memorization in Multiple-Choice LLM Evaluation Benchmarks0
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions0
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees0
Thinking Outside the (Gray) Box: A Context-Based Score for Assessing Value and Originality in Neural Text Generation0
Lean-ing on Quality: How High-Quality Data Beats Diverse Multilingual Data in AutoFormalization0
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding0
Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption0
A Study on Leveraging Search and Self-Feedback for Agent Reasoning0
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task0
Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models0
Warmup-Distill: Bridge the Distribution Mismatch between Teacher and Student before Knowledge DistillationCode0
Scaling Test-Time Compute Without Verification or RL is Suboptimal0
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving0
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration PitfallsCode0
Graders should cheat: privileged information enables expert-level automated evaluations0
1bit-Merging: Dynamic Quantized Merging for Large Language Models0
CRANE: Reasoning with constrained LLM generation0
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency0
Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving0
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical RangesCode0
O1 Embedder: Let Retrievers Think Before Action0
Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical ReasoningCode0
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations0
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization0
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation0
Entropy Adaptive Decoding: Dynamic Model Switching for Efficient Inference0
Upweighting Easy Samples in Fine-Tuning Mitigates ForgettingCode0
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry20
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment0
Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs0
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model0
Learning Autonomous Code Integration for Math Language Models0
Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?0
Blink of an eye: a simple theory for feature localization in generative models0
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning0
Spend Wisely: Maximizing Post-Training Gains in Iterative Synthetic Data BoostrappingCode0
Show:102550
← PrevPage 16 of 32Next →

No leaderboard results yet.