Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 1596 papers

Title	Date	Tasks	Status	Hype
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning	May 22, 2025	GSM8KMath	CodeCode Available	0
SATURN: SAT-based Reinforcement Learning to Unleash Language Model Reasoning	May 22, 2025	Language ModelingLanguage Modelling	CodeCode Available	0
Veracity Bias and Beyond: Uncovering LLMs' Hidden Beliefs in Problem-Solving Reasoning	May 22, 2025	AttributeMath	—Unverified	0
Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs	May 22, 2025	DiagnosticMachine Unlearning	CodeCode Available	1
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs	May 22, 2025	ChatbotMath	CodeCode Available	0
Training Step-Level Reasoning Verifiers with Formal Verification Tools	May 21, 2025	Formal LogicMath	CodeCode Available	1
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems	May 21, 2025	BenchmarkingMath	—Unverified	0
MAPS: A Multilingual Benchmark for Global Agent Performance and Security	May 21, 2025	Code GenerationMath	—Unverified	0
How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study	May 21, 2025	Math	CodeCode Available	0
Can LLMs understand Math? -- Exploring the Pitfalls in Mathematical Reasoning	May 21, 2025	MathMathematical Reasoning	—Unverified	0
Meta-Design Matters: A Self-Design Multi-Agent System	May 21, 2025	MathProblem Decomposition	CodeCode Available	2
RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning	May 21, 2025	MathMathematical Reasoning	CodeCode Available	2
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision	May 21, 2025	GSM8KLearning-To-Rank	—Unverified	0
ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World Challenges	May 21, 2025	Mathvalid	CodeCode Available	1
SSR: Speculative Parallel Scaling Reasoning in Test-time	May 21, 2025	DiversityMath	—Unverified	0
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities	May 21, 2025	MathReinforcement Learning (RL)	—Unverified	0
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning	May 21, 2025	Math	CodeCode Available	1
MIRB: Mathematical Information Retrieval Benchmark	May 21, 2025	Automated Theorem ProvingInformation Retrieval	CodeCode Available	0
EasyMath: A 0-shot Math Benchmark for SLMs	May 20, 2025	Math	—Unverified	0
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning	May 20, 2025	MathOffline RL	—Unverified	0
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning	May 20, 2025	MathReinforcement Learning (RL)	—Unverified	0
Let's Verify Math Questions Step by Step	May 20, 2025	MathMathematical Reasoning	CodeCode Available	1
TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning	May 20, 2025	MathReinforcement Learning (RL)	CodeCode Available	1
The Hallucination Tax of Reinforcement Finetuning	May 20, 2025	HallucinationMath	—Unverified	0
General-Reasoner: Advancing LLM Reasoning Across All Domains	May 20, 2025	AllMath	CodeCode Available	3
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings	May 19, 2025	HumanEvalMath	CodeCode Available	0
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space	May 19, 2025	GSM8KMath	CodeCode Available	2
AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database	May 19, 2025	Data AugmentationIn-Context Learning	—Unverified	0
AdaptThink: Reasoning Models Can Learn When to Think	May 19, 2025	Math	CodeCode Available	2
Thinkless: LLM Learns When to Think	May 19, 2025	GSM8KMath	CodeCode Available	3
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision	May 19, 2025	MathMathematical Reasoning	CodeCode Available	4
Synthetic Data RL: Task Definition Is All You Need	May 18, 2025	AllGSM8K	CodeCode Available	2
Efficient RL Training for Reasoning Models via Length-Aware Optimization	May 18, 2025	Math	CodeCode Available	1
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization	May 18, 2025	MathMathematical Reasoning	—Unverified	0
MARGE: Improving Math Reasoning for LLMs with Guided Exploration	May 18, 2025	MathMathematical Reasoning	CodeCode Available	0
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class	May 17, 2025	MathMathematical Problem-Solving	CodeCode Available	0
LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades	May 17, 2025	Language ModelingLanguage Modelling	—Unverified	0
MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities	May 17, 2025	Math	—Unverified	0
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems	May 17, 2025	Arithmetic ReasoningCode Generation	CodeCode Available	1
MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports	May 16, 2025	DiagnosticMath	CodeCode Available	1
Critique-Guided Distillation: Improving Supervised Fine-tuning via Better Distillation	May 16, 2025	MathMMLU	—Unverified	0
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning	May 16, 2025	Math	—Unverified	0
HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization	May 16, 2025	Math	CodeCode Available	0
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning	May 15, 2025	cross-modal alignmentGeometry Problem Solving	CodeCode Available	3
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models	May 15, 2025	Mathreinforcement-learning	CodeCode Available	2
Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models	May 15, 2025	Large Language ModelMath	CodeCode Available	0
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models	May 15, 2025	Code GenerationGSM8K	—Unverified	0
DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs	May 15, 2025	BenchmarkingFairness	—Unverified	0
PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning	May 14, 2025	MathMathematical Problem-Solving	CodeCode Available	0
Measurement to Meaning: A Validity-Centered Framework for AI Evaluation	May 13, 2025	Math	—Unverified	0

Show:10 25 50

← PrevPage 4 of 32Next →

No leaderboard results yet.