SOTAVerified

Math

Papers

Showing 701725 of 1596 papers

TitleStatusHype
Effective Skill Unlearning through Intervention and AbstentionCode0
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language ModelsCode0
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators0
1.4 Million Open-Source Distilled Reasoning Dataset to Empower Large Language Model Training0
Gemma 3 Technical Report0
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking0
Teaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning0
Activation Functions Considered Harmful: Recovering Neural Network Weights through Controlled Channels0
Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling0
MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection0
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?Code0
Long Is More Important Than Difficult for Training Reasoning Models0
ChatBench: From Static Benchmarks to Human-AI EvaluationCode0
Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them0
BurTorch: Revisiting Training from First Principles by Coupling Autodiff, Math Optimization, and Systems0
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs0
Improving Complex Reasoning with Dynamic Prompt Corruption: A soft prompt Optimization Approach0
Pensez: Less Data, Better Reasoning -- Rethinking French LLM0
SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?0
The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory0
StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-ErrorCode0
Understanding the Logical Capabilities of Large Language Models via Out-of-Context Representation Learning0
Conformal Prediction Sets for Deep Generative Models via Reduction to Conformal Regression0
Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data0
From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics0
Show:102550
← PrevPage 29 of 64Next →

No leaderboard results yet.