Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–450 of 1596 papers

Title	Date	Tasks	Status	Hype
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities	Feb 17, 2025	Code GenerationHumanEval	CodeCode Available	1
Dyve: Thinking Fast and Slow for Dynamic Process Verification	Feb 16, 2025	Math	CodeCode Available	1
Don't Get Lost in the Trees: Streamlining LLM Reasoning by Overcoming Tree Search Exploration Pitfalls	Feb 16, 2025	Computational EfficiencyGSM8K	CodeCode Available	0
Graders should cheat: privileged information enables expert-level automated evaluations	Feb 16, 2025	Math	—Unverified	0
Enhancing Cross-Tokenizer Knowledge Distillation with Contextual Dynamical Mapping	Feb 16, 2025	Code GenerationInstruction Following	CodeCode Available	1
1bit-Merging: Dynamic Quantized Merging for Large Language Models	Feb 15, 2025	Code GenerationMath	—Unverified	0
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency	Feb 13, 2025	BenchmarkingMath	—Unverified	0
CRANE: Reasoning with constrained LLM generation	Feb 13, 2025	Code GenerationMath	—Unverified	0
Interactive Sketchpad: A Multimodal Tutoring System for Collaborative, Visual Problem-Solving	Feb 12, 2025	Mathmultimodal interaction	—Unverified	0
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges	Feb 12, 2025	GSM8KMath	CodeCode Available	0
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!	Feb 11, 2025	Large Language ModelMath	CodeCode Available	7
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving	Feb 11, 2025	Automated Theorem ProvingLarge Language Model	CodeCode Available	3
O1 Embedder: Let Retrievers Think Before Action	Feb 11, 2025	Contrastive LearningMath	—Unverified	0
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction	Feb 11, 2025	Code GenerationMath	CodeCode Available	4
Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning	Feb 11, 2025	Code GenerationMath	CodeCode Available	0
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling	Feb 10, 2025	Math	CodeCode Available	3
MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations	Feb 10, 2025	BenchmarkingIn-Context Learning	—Unverified	0
On the Emergence of Thinking in LLMs I: Searching for the Right Intuition	Feb 10, 2025	Math	CodeCode Available	2
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates	Feb 10, 2025	Hierarchical Reinforcement LearningLanguage Modeling	CodeCode Available	4
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning	Feb 10, 2025	MathMathematical Reasoning	CodeCode Available	2
Evolving LLMs' Self-Refinement Capability via Iterative Preference Optimization	Feb 8, 2025	GSM8KMath	—Unverified	0
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?	Feb 7, 2025	8kInformation Retrieval	CodeCode Available	2
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation	Feb 6, 2025	In-Context LearningKnowledge Distillation	—Unverified	0
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2	Feb 5, 2025	Language ModelingLanguage Modelling	—Unverified	0
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment	Feb 5, 2025	GSM8KHumanEval	—Unverified	0
Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting	Feb 5, 2025	GSM8KMath	CodeCode Available	0
Entropy Adaptive Decoding: Dynamic Model Switching for Efficient Inference	Feb 5, 2025	Computational EfficiencyLanguage Modeling	—Unverified	0
LIMO: Less is More for Reasoning	Feb 5, 2025	MathMathematical Reasoning	CodeCode Available	5
Do Large Language Model Benchmarks Test Reliability?	Feb 5, 2025	Language ModelingLanguage Modelling	CodeCode Available	1
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model	Feb 4, 2025	Instruction FollowingLanguage Modeling	—Unverified	0
Premise-Augmented Reasoning Chains Improve Error Identification in Math reasoning with LLMs	Feb 4, 2025	MathMathematical Reasoning	—Unverified	0
Process Reinforcement through Implicit Rewards	Feb 3, 2025	MathReinforcement Learning (RL)	CodeCode Available	5
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods	Feb 3, 2025	MathMathematical Reasoning	CodeCode Available	1
Blink of an eye: a simple theory for feature localization in generative models	Feb 2, 2025	Math	—Unverified	0
Learning Autonomous Code Integration for Math Language Models	Feb 2, 2025	Math	—Unverified	0
Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?	Feb 2, 2025	MathMMLU	—Unverified	0
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models	Feb 1, 2025	Math	CodeCode Available	2
Fairshare Data Pricing via Data Valuation for Large Language Models	Jan 31, 2025	Data ValuationMath	—Unverified	0
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning	Jan 31, 2025	Language ModelingLanguage Modelling	—Unverified	0
s1: Simple test-time scaling	Jan 31, 2025	Language ModelingLanguage Modelling	CodeCode Available	9
Pheromone-based Learning of Optimal Reasoning Paths	Jan 31, 2025	ARCGSM8K	—Unverified	0
Spend Wisely: Maximizing Post-Training Gains in Iterative Synthetic Data Boostrapping	Jan 31, 2025	DenoisingImage Denoising	CodeCode Available	0
PixelWorld: Towards Perceiving Everything as Pixels	Jan 31, 2025	Math	—Unverified	0
Examining the Robustness of Large Language Models across Language Complexity	Jan 30, 2025	Math	—Unverified	0
Efficient Neural Theorem Proving via Fine-grained Proof Structure Analysis	Jan 30, 2025	Automated Theorem ProvingMath	CodeCode Available	1
Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH	Jan 30, 2025	Language ModelingLanguage Modelling	—Unverified	0
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate	Jan 29, 2025	Instruction FollowingMath	CodeCode Available	2
Token-by-Token Regeneration and Domain Biases: A Benchmark of LLMs on Advanced Mathematical Problem-Solving	Jan 28, 2025	MathMathematical Problem-Solving	—Unverified	0
Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework	Jan 26, 2025	MathMathematical Reasoning	—Unverified	0
Clear Preferences Leave Traces: Reference Model-Guided Sampling for Preference Learning	Jan 25, 2025	Math	—Unverified	0

Show:10 25 50

← PrevPage 9 of 32Next →

No leaderboard results yet.