SOTAVerified

Math

Papers

Showing 151175 of 1596 papers

TitleStatusHype
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action PruningCode0
Incremental Sequence Classification with Temporal Consistency0
Veracity Bias and Beyond: Uncovering LLMs' Hidden Beliefs in Problem-Solving Reasoning0
X-MAS: Towards Building Multi-Agent Systems with Heterogeneous LLMsCode0
Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMsCode1
Training Step-Level Reasoning Verifiers with Formal Verification ToolsCode1
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems0
How Should We Enhance the Safety of Large Reasoning Models: An Empirical StudyCode0
MAPS: A Multilingual Benchmark for Global Agent Performance and Security0
SSR: Speculative Parallel Scaling Reasoning in Test-time0
MIRB: Mathematical Information Retrieval BenchmarkCode0
The Unreasonable Effectiveness of Entropy Minimization in LLM ReasoningCode1
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision0
ModelingAgent: Bridging LLMs and Mathematical Modeling for Real-World ChallengesCode1
RL Tango: Reinforcing Generator and Verifier Together for Language ReasoningCode2
Can LLMs understand Math? -- Exploring the Pitfalls in Mathematical Reasoning0
Meta-Design Matters: A Self-Design Multi-Agent SystemCode2
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities0
EasyMath: A 0-shot Math Benchmark for SLMs0
RL of Thoughts: Navigating LLM Reasoning with Inference-time Reinforcement Learning0
Unearthing Gems from Stones: Policy Optimization with Negative Sample Augmentation for LLM Reasoning0
Let's Verify Math Questions Step by StepCode1
The Hallucination Tax of Reinforcement Finetuning0
General-Reasoner: Advancing LLM Reasoning Across All DomainsCode3
TinyV: Reducing False Negatives in Verification Improves RL for LLM ReasoningCode1
Show:102550
← PrevPage 7 of 64Next →

No leaderboard results yet.