SOTAVerified

Math

Papers

Showing 776800 of 1596 papers

TitleStatusHype
Scaling Test-Time Compute Without Verification or RL is Suboptimal0
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving0
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration PitfallsCode0
Graders should cheat: privileged information enables expert-level automated evaluations0
1bit-Merging: Dynamic Quantized Merging for Large Language Models0
CRANE: Reasoning with constrained LLM generation0
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency0
Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving0
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical RangesCode0
O1 Embedder: Let Retrievers Think Before Action0
Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical ReasoningCode0
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations0
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization0
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation0
Entropy Adaptive Decoding: Dynamic Model Switching for Efficient Inference0
Upweighting Easy Samples in Fine-Tuning Mitigates ForgettingCode0
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry20
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment0
Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs0
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model0
Learning Autonomous Code Integration for Math Language Models0
Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?0
Blink of an eye: a simple theory for feature localization in generative models0
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning0
Spend Wisely: Maximizing Post-Training Gains in Iterative Synthetic Data BoostrappingCode0
Show:102550
← PrevPage 32 of 64Next →

No leaderboard results yet.