SOTAVerified

Math

Papers

Showing 126150 of 1596 papers

TitleStatusHype
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles0
Inference-time Alignment in Continuous SpaceCode0
AI4Math: A Native Spanish Benchmark for University-Level Mathematical Reasoning in Large Language Models0
MMATH: A Multilingual Benchmark for Mathematical ReasoningCode0
Steering LLM Reasoning Through Bias-Only Adaptation0
Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math CompetitionsCode0
Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment?0
MSA at BEA 2025 Shared Task: Disagreement-Aware Instruction Tuning for Multi-Dimensional Evaluation of LLMs as Math Tutors0
On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization0
Anchored Diffusion Language Model0
How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled BenchmarkCode0
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models0
Decoupled Visual Interpretation and Linguistic Reasoning for Math Problem SolvingCode1
VideoGameBench: Can Vision-Language Models complete popular video games?0
One RL to See Them All: Visual Triple Unified Reinforcement Learning0
Value-Guided Search for Efficient Chain-of-Thought ReasoningCode1
Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement LearningCode1
Outcome-based Reinforcement Learning to Predict the Future0
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs0
RaDeR: Reasoning-aware Dense Retrieval ModelsCode1
ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning ModelsCode0
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning0
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement LearningCode2
Incremental Sequence Classification with Temporal Consistency0
Veracity Bias and Beyond: Uncovering LLMs' Hidden Beliefs in Problem-Solving Reasoning0
Show:102550
← PrevPage 6 of 64Next →

No leaderboard results yet.