SOTAVerified

Math

Papers

Showing 601650 of 1596 papers

TitleStatusHype
Anchored Diffusion Language Model0
How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled BenchmarkCode0
MSA at BEA 2025 Shared Task: Disagreement-Aware Instruction Tuning for Multi-Dimensional Evaluation of LLMs as Math Tutors0
On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization0
VideoGameBench: Can Vision-Language Models complete popular video games?0
Outcome-based Reinforcement Learning to Predict the Future0
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models0
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs0
One RL to See Them All: Visual Triple Unified Reinforcement Learning0
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning0
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMsCode0
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action PruningCode0
SATURN: SAT-based Reinforcement Learning to Unleash Language Model ReasoningCode0
Incremental Sequence Classification with Temporal Consistency0
ConciseRL: Conciseness-Guided Reinforcement Learning for Efficient Reasoning ModelsCode0
Veracity Bias and Beyond: Uncovering LLMs' Hidden Beliefs in Problem-Solving Reasoning0
RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs0
Can LLMs understand Math? -- Exploring the Pitfalls in Mathematical Reasoning0
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems0
SSR: Speculative Parallel Scaling Reasoning in Test-time0
MIRB: Mathematical Information Retrieval BenchmarkCode0
How Should We Enhance the Safety of Large Reasoning Models: An Empirical StudyCode0
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision0
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities0
MAPS: A Multilingual Benchmark for Global Agent Performance and Security0
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning0
EasyMath: A 0-shot Math Benchmark for SLMs0
The Hallucination Tax of Reinforcement Finetuning0
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning0
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained SettingsCode0
AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database0
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization0
MARGE: Improving Math Reasoning for LLMs with Guided ExplorationCode0
MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities0
LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades0
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate ClassCode0
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning0
HAPO: Training Language Models to Reason Concisely via History-Aware Policy OptimizationCode0
Critique-Guided Distillation: Improving Supervised Fine-tuning via Better Distillation0
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models0
DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs0
Towards a Deeper Understanding of Reasoning Capabilities in Large Language ModelsCode0
PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt TuningCode0
Accelerating Chain-of-Thought Reasoning: When Goal-Gradient Importance Meets Dynamic Skipping0
Measurement to Meaning: A Validity-Centered Framework for AI Evaluation0
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models0
Learning from Peers in Reasoning Models0
Multimodal Assessment of Classroom Discourse Quality: A Text-Centered Attention-Based Multi-Task Learning Approach0
DialogueReason: Rule-Based RL Sparks Dialogue Reasoning in LLMs0
xGen-small Technical Report0
Show:102550
← PrevPage 13 of 32Next →

No leaderboard results yet.