SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 701–725 of 1596 papers

Title	Date	Tasks	Status	Hype
Effective Skill Unlearning through Intervention and Abstention	Mar 27, 2025	General KnowledgeMath	CodeCode Available	0
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models	Mar 27, 2025	Data VisualizationMath	CodeCode Available	0
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators	Mar 25, 2025	Math	—Unverified	0
1.4 Million Open-Source Distilled Reasoning Dataset to Empower Large Language Model Training	Mar 25, 2025	Language ModelingLanguage Modelling	—Unverified	0
Gemma 3 Technical Report	Mar 25, 2025	Instruction FollowingMath	—Unverified	0
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking	Mar 25, 2025	MathReinforcement Learning (RL)	—Unverified	0
Teaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning	Mar 24, 2025	Language ModelingLanguage Modelling	—Unverified	0
Activation Functions Considered Harmful: Recovering Neural Network Weights through Controlled Channels	Mar 24, 2025	Math	—Unverified	0
Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling	Mar 24, 2025	Continual PretrainingLanguage Modeling	—Unverified	0
MathAgent: Leveraging a Mixture-of-Math-Agent Framework for Real-World Multimodal Mathematical Error Detection	Mar 23, 2025	MathMathematical Problem-Solving	—Unverified	0
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts?	Mar 23, 2025	GSM8KMath	CodeCode Available	0
Long Is More Important Than Difficult for Training Reasoning Models	Mar 23, 2025	Math	—Unverified	0
ChatBench: From Static Benchmarks to Human-AI Evaluation	Mar 22, 2025	MathMMLU	CodeCode Available	0
Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them	Mar 20, 2025	MathMemorization	—Unverified	0
BurTorch: Revisiting Training from First Principles by Coupling Autodiff, Math Optimization, and Systems	Mar 18, 2025	CPUMath	—Unverified	0
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs	Mar 18, 2025	GSM8KMath	—Unverified	0
Improving Complex Reasoning with Dynamic Prompt Corruption: A soft prompt Optimization Approach	Mar 17, 2025	GSM8KMath	—Unverified	0
Pensez: Less Data, Better Reasoning -- Rethinking French LLM	Mar 17, 2025	Large Language ModelMath	—Unverified	0
SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?	Mar 16, 2025	Board GamesCard Games	—Unverified	0
The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory	Mar 13, 2025	MathMultiple-choice	—Unverified	0
StepMathAgent: A Step-Wise Agent for Evaluating Mathematical Processes through Tree-of-Error	Mar 13, 2025	Math	CodeCode Available	0
Understanding the Logical Capabilities of Large Language Models via Out-of-Context Representation Learning	Mar 13, 2025	In-Context LearningMath	—Unverified	0
Conformal Prediction Sets for Deep Generative Models via Reduction to Conformal Regression	Mar 13, 2025	Code GenerationConformal Prediction	—Unverified	0
Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data	Mar 13, 2025	Large Language ModelMath	—Unverified	0
From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics	Mar 10, 2025	MathQuestion Answering	—Unverified	0

Show:10 25 50

← PrevPage 29 of 64Next →

No leaderboard results yet.