SOTAVerified|Agents Browse Leaderboard About

Math

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1501–1525 of 1596 papers

Title	Date	Tasks	Status	Hype	Score
Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination	Jun 10, 2023	MathMathematical Reasoning	—Unverified	0	0
Investigating the Efficacy of Large Language Models in Reflective Assessment Methods through Chain of Thoughts Prompting	Sep 30, 2023	Math	—Unverified	0	0
Thinking Outside the (Gray) Box: A Context-Based Score for Assessing Value and Originality in Neural Text Generation	Feb 18, 2025	DiversityMath	—Unverified	0	0
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations	Apr 1, 2024	BenchmarkingMath	—Unverified	0	0
Solving Functional Optimization with Deep Networks and Variational Principles	Oct 8, 2024	Math	—Unverified	0	0
Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs	Jan 21, 2025	GSM8KIn-Context Learning	—Unverified	0	0
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist	Jul 11, 2024	GSM8KMath	—Unverified	0	0
Iterative Reasoning Preference Optimization	Apr 30, 2024	ARCGSM8K	—Unverified	0	0
Yi-Lightning Technical Report	Dec 2, 2024	ChatbotLarge Language Model	—Unverified	0	0
Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models	Jun 16, 2025	Mathreinforcement-learning	—Unverified	0	0
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation	Oct 22, 2024	Math	—Unverified	0	0
Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning	Oct 8, 2024	Image RetrievalMath	—Unverified	0	0
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking	Mar 25, 2025	MathReinforcement Learning (RL)	—Unverified	0	0
Kappa Learning: A New Method for Measuring Similarity Between Educational Items Using Performance Data	Dec 20, 2018	ClusteringMath	—Unverified	0	0
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning	Mar 4, 2024	GSM8KMath	—Unverified	0	0
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities	May 21, 2025	MathReinforcement Learning (RL)	—Unverified	0	0
Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains	Jun 2, 2025	MathReinforcement Learning (RL)	—Unverified	0	0
Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever	Jun 19, 2024	MathSemantic Similarity	—Unverified	0	0
Knowledge Tagging with Large Language Model based Multi-Agent System	Sep 12, 2024	Language ModelingLanguage Modelling	—Unverified	0	0
Kokoyi: Executable LaTeX for End-to-end Deep Learning	Sep 29, 2021	Deep LearningMath	—Unverified	0	0
L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models	Sep 29, 2023	Code GenerationMath	—Unverified	0	0
Better Process Supervision with Bi-directional Rewarding Signals	Mar 6, 2025	Language ModelingLanguage Modelling	—Unverified	0	0
Adapting the LodView RDF Browser for Navigation over the Multilingual Linguistic Linked Open Data Cloud	Aug 28, 2022	Math	—Unverified	0	0
Benchmarking Reasoning Robustness in Large Language Models	Mar 6, 2025	BenchmarkingMath	—Unverified	0	0
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models	Apr 17, 2025	BenchmarkingMath	—Unverified	0	0

Show:10 25 50

← PrevPage 61 of 64Next →

No leaderboard results yet.