SOTAVerified

Multiple-choice

Papers

Showing 221230 of 1107 papers

TitleStatusHype
Estimating Contamination via Perplexity: Quantifying Memorisation in Language Model EvaluationCode1
Enhancing Knowledge Tracing with Concept Map and Response DisentanglementCode1
Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission ExamsCode1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive SummarizationCode1
SportQA: A Benchmark for Sports Understanding in Large Language ModelsCode1
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive FrameworkCode1
STARC: Structured Annotations for Reading ComprehensionCode1
Evaluating language models as risk scoresCode1
Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcomCode1
Show:102550
← PrevPage 23 of 111Next →

No leaderboard results yet.