SOTAVerified

Multiple-choice

Papers

Showing 231240 of 1107 papers

TitleStatusHype
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and ReasoningCode1
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive FrameworkCode1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language ModelsCode1
Enhancing Knowledge Tracing with Concept Map and Response DisentanglementCode1
TIMEDIAL: Temporal Commonsense Reasoning in DialogCode1
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language ModelsCode1
To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question AnsweringCode1
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-trainingCode1
Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcomCode1
Show:102550
← PrevPage 24 of 111Next →

No leaderboard results yet.