SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 291–300 of 1596 papers

Title	Date	Tasks	Status	Hype
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees	Mar 11, 2025	ChatbotLanguage Modeling	CodeCode Available	1
PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models	Mar 4, 2025	GSM8KMath	CodeCode Available	1
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving	Feb 27, 2025	GSM8KMath	CodeCode Available	1
Self-Training Elicits Concise Reasoning in Large Language Models	Feb 27, 2025	GSM8KIn-Context Learning	CodeCode Available	1
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?	Feb 26, 2025	Math	CodeCode Available	1
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind	Feb 21, 2025	MathMathematical Problem-Solving	CodeCode Available	1
How to Get Your LLM to Generate Challenging Problems for Evaluation	Feb 20, 2025	Code CompletionMath	CodeCode Available	1
Reasoning with Reinforced Functional Token Tuning	Feb 19, 2025	Math	CodeCode Available	1
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities	Feb 17, 2025	Code GenerationHumanEval	CodeCode Available	1
Thinking Preference Optimization	Feb 17, 2025	Math	CodeCode Available	1

Show:10 25 50

← PrevPage 30 of 160Next →

No leaderboard results yet.