SOTAVerified

Math

Papers

Showing 15011525 of 1596 papers

TitleStatusHype
Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination0
Investigating the Efficacy of Large Language Models in Reflective Assessment Methods through Chain of Thoughts Prompting0
Thinking Outside the (Gray) Box: A Context-Based Score for Assessing Value and Originality in Neural Text Generation0
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations0
Solving Functional Optimization with Deep Networks and Variational Principles0
Is your LLM trapped in a Mental Set? Investigative study on how mental sets affect the reasoning capabilities of LLMs0
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist0
Iterative Reasoning Preference Optimization0
Yi-Lightning Technical Report0
Adaptive Guidance Accelerates Reinforcement Learning of Reasoning Models0
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation0
Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning0
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking0
Kappa Learning: A New Method for Measuring Similarity Between Educational Items Using Performance Data0
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning0
Thought-Augmented Policy Optimization: Bridging External Guidance and Internal Capabilities0
Knowledge or Reasoning? A Close Look at How LLMs Think Across Domains0
Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever0
Knowledge Tagging with Large Language Model based Multi-Agent System0
Kokoyi: Executable LaTeX for End-to-end Deep Learning0
L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models0
Better Process Supervision with Bi-directional Rewarding Signals0
Adapting the LodView RDF Browser for Navigation over the Multilingual Linguistic Linked Open Data Cloud0
Benchmarking Reasoning Robustness in Large Language Models0
THOUGHTTERMINATOR: Benchmarking, Calibrating, and Mitigating Overthinking in Reasoning Models0
Show:102550
← PrevPage 61 of 64Next →

No leaderboard results yet.