Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 601–650 of 1596 papers

Title	Date	Tasks	Status
Anchored Diffusion Language Model	May 24, 2025	Language ModelingLanguage Modelling	—Unverified
How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark	May 24, 2025	Math	CodeCode Available
MSA at BEA 2025 Shared Task: Disagreement-Aware Instruction Tuning for Multi-Dimensional Evaluation of LLMs as Math Tutors	May 24, 2025	Language ModelingLanguage Modelling	—Unverified
On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization	May 24, 2025	MathReinforcement Learning (RL)	—Unverified
VideoGameBench: Can Vision-Language Models complete popular video games?	May 23, 2025	Math	—Unverified
Outcome-based Reinforcement Learning to Predict the Future	May 23, 2025	Holdout SetMath	—Unverified
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models	May 23, 2025	DiagnosticHallucination	—Unverified
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs	May 23, 2025	Cross-Lingual TransferMath	—Unverified
One RL to See Them All: Visual Triple Unified Reinforcement Learning	May 23, 2025	AllMath	—Unverified
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning	May 22, 2025	Mathreinforcement-learning	—Unverified
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs	May 22, 2025	ChatbotMath	CodeCode Available
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning	May 22, 2025	GSM8KMath	CodeCode Available
SATURN: SAT-based Reinforcement Learning to Unleash Language Model Reasoning	May 22, 2025	Language ModelingLanguage Modelling	CodeCode Available
Incremental Sequence Classification with Temporal Consistency	May 22, 2025	ClassificationLanguage Modeling	—Unverified
ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models	May 22, 2025	Large Language ModelMath	CodeCode Available
Veracity Bias and Beyond: Uncovering LLMs' Hidden Beliefs in Problem-Solving Reasoning	May 22, 2025	AttributeMath	—Unverified
RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs	May 22, 2025	Image ManipulationMath	—Unverified
Can LLMs understand Math? -- Exploring the Pitfalls in Mathematical Reasoning	May 21, 2025	MathMathematical Reasoning	—Unverified
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems	May 21, 2025	BenchmarkingMath	—Unverified
SSR: Speculative Parallel Scaling Reasoning in Test-time	May 21, 2025	DiversityMath	—Unverified
MIRB: Mathematical Information Retrieval Benchmark	May 21, 2025	Automated Theorem ProvingInformation Retrieval	CodeCode Available
How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study	May 21, 2025	Math	CodeCode Available
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision	May 21, 2025	GSM8KLearning-To-Rank	—Unverified
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities	May 21, 2025	MathReinforcement Learning (RL)	—Unverified
MAPS: A Multilingual Benchmark for Global Agent Performance and Security	May 21, 2025	Code GenerationMath	—Unverified
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning	May 20, 2025	MathReinforcement Learning (RL)	—Unverified
EasyMath: A 0-shot Math Benchmark for SLMs	May 20, 2025	Math	—Unverified
The Hallucination Tax of Reinforcement Finetuning	May 20, 2025	HallucinationMath	—Unverified
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning	May 20, 2025	MathOffline RL	—Unverified
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings	May 19, 2025	HumanEvalMath	CodeCode Available
AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database	May 19, 2025	Data AugmentationIn-Context Learning	—Unverified
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization	May 18, 2025	MathMathematical Reasoning	—Unverified
MARGE: Improving Math Reasoning for LLMs with Guided Exploration	May 18, 2025	MathMathematical Reasoning	CodeCode Available
MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities	May 17, 2025	Math	—Unverified
LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades	May 17, 2025	Language ModelingLanguage Modelling	—Unverified
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class	May 17, 2025	MathMathematical Problem-Solving	CodeCode Available
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning	May 16, 2025	Math	—Unverified
HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization	May 16, 2025	Math	CodeCode Available
Critique-Guided Distillation: Improving Supervised Fine-tuning via Better Distillation	May 16, 2025	MathMMLU	—Unverified
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models	May 15, 2025	Code GenerationGSM8K	—Unverified
DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs	May 15, 2025	BenchmarkingFairness	—Unverified
Towards a Deeper Understanding of Reasoning Capabilities in Large Language Models	May 15, 2025	Large Language ModelMath	CodeCode Available
PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning	May 14, 2025	MathMathematical Problem-Solving	CodeCode Available
Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping	May 13, 2025	Domain GeneralizationGSM8K	—Unverified
Measurement to Meaning: A Validity-Centered Framework for AI Evaluation	May 13, 2025	Math	—Unverified
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models	May 12, 2025	GSM8KLarge Language Model	—Unverified
Learning from Peers in Reasoning Models	May 12, 2025	Math	—Unverified
Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach	May 12, 2025	MathMulti-Task Learning	—Unverified
DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs	May 11, 2025	DiversityMath	—Unverified
xGen-small Technical Report	May 10, 2025	DecoderMath	—Unverified

Show:10 25 50

← PrevPage 13 of 32Next →

No leaderboard results yet.