SOTAVerified

Math

Papers

Showing 401425 of 1596 papers

TitleStatusHype
Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQLCode1
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration PitfallsCode0
Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical MappingCode1
Graders should cheat: privileged information enables expert-level automated evaluations0
Dyve: Thinking Fast and Slow for Dynamic Process VerificationCode1
1bit-Merging: Dynamic Quantized Merging for Large Language Models0
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency0
CRANE: Reasoning with constrained LLM generation0
Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving0
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical RangesCode0
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem ProvingCode3
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!Code7
Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical ReasoningCode0
O1 Embedder: Let Retrievers Think Before Action0
CodeI/O: Condensing Reasoning Patterns via Code Input-Output PredictionCode4
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time ScalingCode3
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations0
Exploring the Limit of Outcome Reward for Learning Mathematical ReasoningCode2
On the Emergence of Thinking in LLMs I: Searching for the Right IntuitionCode2
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought TemplatesCode4
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization0
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?Code2
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation0
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry20
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment0
Show:102550
← PrevPage 17 of 64Next →

No leaderboard results yet.