Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–425 of 1596 papers

Title	Date	Tasks	Status	Hype
Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL	Feb 17, 2025	Code GenerationMath	CodeCode Available	1
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls	Feb 16, 2025	Computational EfficiencyGSM8K	CodeCode Available	0
Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical Mapping	Feb 16, 2025	Code GenerationInstruction Following	CodeCode Available	1
Graders should cheat: privileged information enables expert-level automated evaluations	Feb 16, 2025	Math	—Unverified	0
Dyve: Thinking Fast and Slow for Dynamic Process Verification	Feb 16, 2025	Math	CodeCode Available	1
1bit-Merging: Dynamic Quantized Merging for Large Language Models	Feb 15, 2025	Code GenerationMath	—Unverified	0
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency	Feb 13, 2025	BenchmarkingMath	—Unverified	0
CRANE: Reasoning with constrained LLM generation	Feb 13, 2025	Code GenerationMath	—Unverified	0
Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving	Feb 12, 2025	Mathmultimodal interaction	—Unverified	0
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges	Feb 12, 2025	GSM8KMath	CodeCode Available	0
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving	Feb 11, 2025	Automated Theorem ProvingLarge Language Model	CodeCode Available	3
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!	Feb 11, 2025	Large Language ModelMath	CodeCode Available	7
Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning	Feb 11, 2025	Code GenerationMath	CodeCode Available	0
O1 Embedder: Let Retrievers Think Before Action	Feb 11, 2025	Contrastive LearningMath	—Unverified	0
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction	Feb 11, 2025	Code GenerationMath	CodeCode Available	4
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling	Feb 10, 2025	Math	CodeCode Available	3
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations	Feb 10, 2025	BenchmarkingIn-Context Learning	—Unverified	0
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning	Feb 10, 2025	MathMathematical Reasoning	CodeCode Available	2
On the Emergence of Thinking in LLMs I: Searching for the Right Intuition	Feb 10, 2025	Math	CodeCode Available	2
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates	Feb 10, 2025	Hierarchical Reinforcement LearningLanguage Modeling	CodeCode Available	4
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization	Feb 8, 2025	GSM8KMath	—Unverified	0
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?	Feb 7, 2025	8kInformation Retrieval	CodeCode Available	2
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation	Feb 6, 2025	In-Context LearningKnowledge Distillation	—Unverified	0
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2	Feb 5, 2025	Language ModelingLanguage Modelling	—Unverified	0
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment	Feb 5, 2025	GSM8KHumanEval	—Unverified	0

Show:10 25 50

← PrevPage 17 of 64Next →

No leaderboard results yet.