SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 626–650 of 1596 papers

Title	Date	Tasks	Status	Hype
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning	May 20, 2025	MathReinforcement Learning (RL)	—Unverified	0
EasyMath: A 0-shot Math Benchmark for SLMs	May 20, 2025	Math	—Unverified	0
The Hallucination Tax of Reinforcement Finetuning	May 20, 2025	HallucinationMath	—Unverified	0
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning	May 20, 2025	MathOffline RL	—Unverified	0
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings	May 19, 2025	HumanEvalMath	CodeCode Available	0
AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database	May 19, 2025	Data AugmentationIn-Context Learning	—Unverified	0
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization	May 18, 2025	MathMathematical Reasoning	—Unverified	0
MARGE: Improving Math Reasoning for LLMs with Guided Exploration	May 18, 2025	MathMathematical Reasoning	CodeCode Available	0
MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities	May 17, 2025	Math	—Unverified	0
LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades	May 17, 2025	Language ModelingLanguage Modelling	—Unverified	0
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class	May 17, 2025	MathMathematical Problem-Solving	CodeCode Available	0
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning	May 16, 2025	Math	—Unverified	0
HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization	May 16, 2025	Math	CodeCode Available	0
Critique-Guided Distillation: Improving Supervised Fine-tuning via Better Distillation	May 16, 2025	MathMMLU	—Unverified	0
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models	May 15, 2025	Code GenerationGSM8K	—Unverified	0
DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs	May 15, 2025	BenchmarkingFairness	—Unverified	0
Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models	May 15, 2025	Large Language ModelMath	CodeCode Available	0
PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning	May 14, 2025	MathMathematical Problem-Solving	CodeCode Available	0
Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping	May 13, 2025	Domain GeneralizationGSM8K	—Unverified	0
Measurement to Meaning: A Validity-Centered Framework for AI Evaluation	May 13, 2025	Math	—Unverified	0
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models	May 12, 2025	GSM8KLarge Language Model	—Unverified	0
Learning from Peers in Reasoning Models	May 12, 2025	Math	—Unverified	0
Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach	May 12, 2025	MathMulti-Task Learning	—Unverified	0
DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs	May 11, 2025	DiversityMath	—Unverified	0
xGen-small Technical Report	May 10, 2025	DecoderMath	—Unverified	0

Show:10 25 50

← PrevPage 26 of 64Next →

No leaderboard results yet.