SOTAVerified

Multiple-choice

Papers

Showing 201210 of 1107 papers

TitleStatusHype
Can large language models reason about medical questions?Code1
A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies.Code1
AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language ModelsCode1
Multiple-Choice Questions are Efficient and Robust LLM EvaluatorsCode1
NarrativeXL: A Large-scale Dataset For Long-Term Memory ModelsCode1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
CommonsenseQA: A Question Answering Challenge Targeting Commonsense KnowledgeCode1
Complex Reasoning over Logical Queries on Commonsense Knowledge GraphsCode1
Estimating Contamination via Perplexity: Quantifying Memorisation in Language Model EvaluationCode1
Explaining NLP Models via Minimal Contrastive Editing (MiCE)Code1
Show:102550
← PrevPage 21 of 111Next →

No leaderboard results yet.