SOTAVerified

Multiple-choice

Papers

Showing 131140 of 1107 papers

TitleStatusHype
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language ModelsCode1
Explaining NLP Models via Minimal Contrastive Editing (MiCE)Code1
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive FrameworkCode1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
Enhancing Knowledge Tracing with Concept Map and Response DisentanglementCode1
EduQG: A Multi-format Multiple Choice Dataset for the Educational DomainCode1
Ranked Voting based Self-Consistency of Large Language ModelsCode1
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language ModelsCode1
Estimating Contamination via Perplexity: Quantifying Memorisation in Language Model EvaluationCode1
An Open Source Data Contamination Report for Large Language ModelsCode1
Show:102550
← PrevPage 14 of 111Next →

No leaderboard results yet.