SOTAVerified

Multiple-choice

Papers

Showing 101110 of 1107 papers

TitleStatusHype
InstructionBench: An Instructional Video Understanding Benchmark0
Can AI Master Construction Management (CM)? Benchmarking State-of-the-Art Large Language Models on CM Certification Exams0
From ChatGPT to DeepSeek AI: A Comprehensive Analysis of Evolution, Deviation, and Future Implications in AI-Language Models0
VEGAS: Towards Visually Explainable and Grounded Artificial Social IntelligenceCode0
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1Code2
ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning0
Question-Aware Knowledge Graph Prompting for Enhancing Large Language ModelsCode0
Order Independence With Finetuning0
Mobile-MMLU: A Mobile Intelligence Language Understanding BenchmarkCode1
Language Model Uncertainty Quantification with Attention ChainCode1
Show:102550
← PrevPage 11 of 111Next →

No leaderboard results yet.