SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 601–625 of 1596 papers

Title	Date	Tasks	Status	Hype
Anchored Diffusion Language Model	May 24, 2025	Language ModelingLanguage Modelling	—Unverified	0
MSA at BEA 2025 Shared Task: Disagreement-Aware Instruction Tuning for Multi-Dimensional Evaluation of LLMs as Math Tutors	May 24, 2025	Language ModelingLanguage Modelling	—Unverified	0
On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization	May 24, 2025	MathReinforcement Learning (RL)	—Unverified	0
How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark	May 24, 2025	Math	CodeCode Available	0
Outcome-based Reinforcement Learning to Predict the Future	May 23, 2025	Holdout SetMath	—Unverified	0
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models	May 23, 2025	DiagnosticHallucination	—Unverified	0
One RL to See Them All: Visual Triple Unified Reinforcement Learning	May 23, 2025	AllMath	—Unverified	0
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs	May 23, 2025	Cross-Lingual TransferMath	—Unverified	0
VideoGameBench: Can Vision-Language Models complete popular video games?	May 23, 2025	Math	—Unverified	0
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning	May 22, 2025	Mathreinforcement-learning	—Unverified	0
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning	May 22, 2025	GSM8KMath	CodeCode Available	0
SATURN: SAT-based Reinforcement Learning to Unleash Language Model Reasoning	May 22, 2025	Language ModelingLanguage Modelling	CodeCode Available	0
RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs	May 22, 2025	Image ManipulationMath	—Unverified	0
Incremental Sequence Classification with Temporal Consistency	May 22, 2025	ClassificationLanguage Modeling	—Unverified	0
ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning Models	May 22, 2025	Large Language ModelMath	CodeCode Available	0
Veracity Bias and Beyond: Uncovering LLMs' Hidden Beliefs in Problem-Solving Reasoning	May 22, 2025	AttributeMath	—Unverified	0
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMs	May 22, 2025	ChatbotMath	CodeCode Available	0
Can LLMs understand Math? -- Exploring the Pitfalls in Mathematical Reasoning	May 21, 2025	MathMathematical Reasoning	—Unverified	0
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems	May 21, 2025	BenchmarkingMath	—Unverified	0
MAPS: A Multilingual Benchmark for Global Agent Performance and Security	May 21, 2025	Code GenerationMath	—Unverified	0
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision	May 21, 2025	GSM8KLearning-To-Rank	—Unverified	0
SSR: Speculative Parallel Scaling Reasoning in Test-time	May 21, 2025	DiversityMath	—Unverified	0
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities	May 21, 2025	MathReinforcement Learning (RL)	—Unverified	0
How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study	May 21, 2025	Math	CodeCode Available	0
MIRB: Mathematical Information Retrieval Benchmark	May 21, 2025	Automated Theorem ProvingInformation Retrieval	CodeCode Available	0

Show:10 25 50

← PrevPage 25 of 64Next →

No leaderboard results yet.