SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 726–750 of 1596 papers

Title	Date	Tasks	Status	Hype
From Text to Visuals: Using LLMs to Generate Math Diagrams with Vector Graphics	Mar 10, 2025	MathQuestion Answering	—Unverified	0
Decoding the Black Box: Integrating Moral Imagination with Technical AI Governance	Mar 9, 2025	EthicsMath	—Unverified	0
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models	Mar 9, 2025	Computational EfficiencyMath	—Unverified	0
Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning	Mar 7, 2025	GPUMath	—Unverified	0
START: Self-taught Reasoner with Tools	Mar 6, 2025	MathSelf-Learning	—Unverified	0
Better Process Supervision with Bi-directional Rewarding Signals	Mar 6, 2025	Language ModelingLanguage Modelling	—Unverified	0
Benchmarking Reasoning Robustness in Large Language Models	Mar 6, 2025	BenchmarkingMath	—Unverified	0
SOLAR: Scalable Optimization of Large-scale Architecture for Reasoning	Mar 6, 2025	GSM8KMath	—Unverified	0
HelpSteer3: Human-Annotated Feedback and Edit Data to Empower Inference-Time Scaling in Open-Ended General-Domain Tasks	Mar 6, 2025	ChatbotLogical Reasoning	—Unverified	0
Compositional Causal Reasoning Evaluation in Language Models	Mar 6, 2025	Math	—Unverified	0
Performance Comparison of Large Language Models on Advanced Calculus Problems	Mar 5, 2025	MathMathematical Problem-Solving	—Unverified	0
LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach	Mar 5, 2025	Instruction FollowingMath	—Unverified	0
FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4	Mar 5, 2025	Answer SelectionMath	—Unverified	0
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models	Mar 4, 2025	GSM8KMath	—Unverified	0
When an LLM is apprehensive about its answers -- and when its uncertainty is justified	Mar 3, 2025	MathMMLU	CodeCode Available	0
What's Behind PPO's Collapse in Long-CoT? Value Optimization Holds the Secret	Mar 3, 2025	MathReinforcement Learning (RL)	—Unverified	0
Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning Models	Mar 3, 2025	Math	—Unverified	0
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts	Feb 28, 2025	MathMathematical Reasoning	—Unverified	0
MAMUT: A Novel Framework for Modifying Mathematical Formulas for the Generation of Specialized Datasets for Language Model Training	Feb 28, 2025	Language ModelingLanguage Modelling	CodeCode Available	0
Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning	Feb 27, 2025	MathMedical Question Answering	—Unverified	0
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning	Feb 25, 2025	MathMathematical Reasoning	—Unverified	0
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution	Feb 25, 2025	MathReinforcement Learning (RL)	—Unverified	0
Reasoning with Latent Thoughts: On the Power of Looped Transformers	Feb 24, 2025	Language ModelingLanguage Modelling	—Unverified	0
Learning Decentralized Swarms Using Rotation Equivariant Graph Neural Networks	Feb 24, 2025	Graph Neural NetworkMath	CodeCode Available	0
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning	Feb 24, 2025	MathMathematical Reasoning	CodeCode Available	0

Show:10 25 50

← PrevPage 30 of 64Next →

No leaderboard results yet.