SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 276–300 of 1596 papers

Title	Date	Tasks	Status	Hype
Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency	Apr 24, 2025	BenchmarkingMath	CodeCode Available	1
Fine-Tuning Large Language Models on Quantum Optimization Problems for Circuit Generation	Apr 15, 2025	MathQuantum Machine Learning	CodeCode Available	1
The Jailbreak Tax: How Useful are Your Jailbreak Outputs?	Apr 14, 2025	Math	CodeCode Available	1
Efficient Process Reward Model Training via Active Learning	Apr 14, 2025	Active LearningMath	CodeCode Available	1
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models	Apr 14, 2025	MambaMath	CodeCode Available	1
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression	Apr 10, 2025	MathMMLU	CodeCode Available	1
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models	Apr 8, 2025	MathMultimodal Reasoning	CodeCode Available	1
Large (Vision) Language Models are Unsupervised In-Context Learners	Apr 3, 2025	GSM8KIn-Context Learning	CodeCode Available	1
BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing	Apr 2, 2025	3D ReconstructionBenchmarking	CodeCode Available	1
Entropy-Based Adaptive Weighting for Self-Training	Mar 31, 2025	GSM8KMath	CodeCode Available	1
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?	Mar 28, 2025	Logical ReasoningMath	CodeCode Available	1
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models	Mar 27, 2025	Math	CodeCode Available	1
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation	Mar 25, 2025	Code CompletionLanguage Modeling	CodeCode Available	1
EXAONE Deep: Reasoning Enhanced Language Models	Mar 16, 2025	Math	CodeCode Available	1
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search	Mar 13, 2025	Image RetrievalMath	CodeCode Available	1
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees	Mar 11, 2025	ChatbotLanguage Modeling	CodeCode Available	1
PromptCoT: Synthesizing Olympiad-level Problems for Mathematical Reasoning in Large Language Models	Mar 4, 2025	GSM8KMath	CodeCode Available	1
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving	Feb 27, 2025	GSM8KMath	CodeCode Available	1
Self-Training Elicits Concise Reasoning in Large Language Models	Feb 27, 2025	GSM8KIn-Context Learning	CodeCode Available	1
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?	Feb 26, 2025	Math	CodeCode Available	1
Forgotten Polygons: Multimodal Large Language Models are Shape-Blind	Feb 21, 2025	MathMathematical Problem-Solving	CodeCode Available	1
How to Get Your LLM to Generate Challenging Problems for Evaluation	Feb 20, 2025	Code CompletionMath	CodeCode Available	1
Reasoning with Reinforced Functional Token Tuning	Feb 19, 2025	Math	CodeCode Available	1
Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities	Feb 17, 2025	Code GenerationHumanEval	CodeCode Available	1
Thinking Preference Optimization	Feb 17, 2025	Math	CodeCode Available	1

Show:10 25 50

← PrevPage 12 of 64Next →

No leaderboard results yet.