SOTAVerified

Math

Papers

Showing 401450 of 1596 papers

TitleStatusHype
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation CapabilitiesCode1
Dyve: Thinking Fast and Slow for Dynamic Process VerificationCode1
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration PitfallsCode0
Graders should cheat: privileged information enables expert-level automated evaluations0
Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical MappingCode1
1bit-Merging: Dynamic Quantized Merging for Large Language Models0
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency0
CRANE: Reasoning with constrained LLM generation0
Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving0
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical RangesCode0
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!Code7
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem ProvingCode3
O1 Embedder: Let Retrievers Think Before Action0
CodeI/O: Condensing Reasoning Patterns via Code Input-Output PredictionCode4
Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical ReasoningCode0
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time ScalingCode3
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations0
On the Emergence of Thinking in LLMs I: Searching for the Right IntuitionCode2
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought TemplatesCode4
Exploring the Limit of Outcome Reward for Learning Mathematical ReasoningCode2
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization0
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?Code2
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation0
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry20
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment0
Upweighting Easy Samples in Fine-Tuning Mitigates ForgettingCode0
Entropy Adaptive Decoding: Dynamic Model Switching for Efficient Inference0
LIMO: Less is More for ReasoningCode5
Do Large Language Model Benchmarks Test Reliability?Code1
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model0
Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs0
Process Reinforcement through Implicit RewardsCode5
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo MethodsCode1
Blink of an eye: a simple theory for feature localization in generative models0
Learning Autonomous Code Integration for Math Language Models0
Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?0
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language ModelsCode2
Fairshare Data Pricing via Data Valuation for Large Language Models0
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning0
s1: Simple test-time scalingCode9
Pheromone-based Learning of Optimal Reasoning Paths0
Spend Wisely: Maximizing Post-Training Gains in Iterative Synthetic Data BoostrappingCode0
PixelWorld: Towards Perceiving Everything as Pixels0
Examining the Robustness of Large Language Models across Language Complexity0
Efficient Neural Theorem Proving via Fine-grained Proof Structure AnalysisCode1
Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH0
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to ImitateCode2
Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving0
Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework0
Clear Preferences Leave Traces: Reference Model-Guided Sampling for Preference Learning0
Show:102550
← PrevPage 9 of 32Next →

No leaderboard results yet.