SOTAVerified

Math

Papers

Showing 151200 of 1596 papers

TitleStatusHype
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action PruningCode0
SATURN: SAT-based Reinforcement Learning to Unleash Language Model ReasoningCode0
Veracity Bias and Beyond: Uncovering LLMs' Hidden Beliefs in Problem-Solving Reasoning0
Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMsCode1
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMsCode0
Training Step-Level Reasoning Verifiers with Formal Verification ToolsCode1
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems0
MAPS: A Multilingual Benchmark for Global Agent Performance and Security0
How Should We Enhance the Safety of Large Reasoning Models: An Empirical StudyCode0
Can LLMs understand Math? -- Exploring the Pitfalls in Mathematical Reasoning0
Meta-Design Matters: A Self-Design Multi-Agent SystemCode2
RL Tango: Reinforcing Generator and Verifier Together for Language ReasoningCode2
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision0
ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World ChallengesCode1
SSR: Speculative Parallel Scaling Reasoning in Test-time0
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities0
The Unreasonable Effectiveness of Entropy Minimization in LLM ReasoningCode1
MIRB: Mathematical Information Retrieval BenchmarkCode0
EasyMath: A 0-shot Math Benchmark for SLMs0
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning0
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning0
Let's Verify Math Questions Step by StepCode1
TinyV: Reducing False Negatives in Verification Improves RL for LLM ReasoningCode1
The Hallucination Tax of Reinforcement Finetuning0
General-Reasoner: Advancing LLM Reasoning Across All DomainsCode3
Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained SettingsCode0
Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent SpaceCode2
AutoMathKG: The automated mathematical knowledge graph based on LLM and vector database0
AdaptThink: Reasoning Models Can Learn When to ThinkCode2
Thinkless: LLM Learns When to ThinkCode3
MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level SupervisionCode4
Synthetic Data RL: Task Definition Is All You NeedCode2
Efficient RL Training for Reasoning Models via Length-Aware OptimizationCode1
SEED-GRPO: Semantic Entropy Enhanced GRPO for Uncertainty-Aware Policy Optimization0
MARGE: Improving Math Reasoning for LLMs with Guided ExplorationCode0
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate ClassCode0
LoRASuite: Efficient LoRA Adaptation Across Large Language Model Upgrades0
MoL for LLMs: Dual-Loss Optimization to Enhance Domain Expertise While Preserving General Capabilities0
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM SystemsCode1
MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reportsCode1
Critique-Guided Distillation: Improving Supervised Fine-tuning via Better Distillation0
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning0
HAPO: Training Language Models to Reason Concisely via History-Aware Policy OptimizationCode0
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical ReasoningCode3
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning ModelsCode2
Towards a Deeper Understanding of Reasoning Capabilities in Large Language ModelsCode0
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models0
DIF: A Framework for Benchmarking and Verifying Implicit Bias in LLMs0
PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt TuningCode0
Measurement to Meaning: A Validity-Centered Framework for AI Evaluation0
Show:102550
← PrevPage 4 of 32Next →

No leaderboard results yet.