SOTAVerified

Multiple-choice

Papers

Showing 341350 of 1107 papers

TitleStatusHype
Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answersCode0
DyePack: Provably Flagging Test Set Contamination in LLMs Using BackdoorsCode0
BERT-based distractor generation for Swedish reading comprehension questions using a small-scale datasetCode0
SCoRE: Benchmarking Long-Chain Reasoning in Commonsense ScenariosCode0
DREAM: A Challenge Dataset and Models for Dialogue-Based Reading ComprehensionCode0
ElimiNet: A Model for Eliminating Options for Reading Comprehension with Multiple Choice QuestionsCode0
BertaQA: How Much Do Language Models Know About Local Culture?Code0
EMBRACE: Evaluation and Modifications for Boosting RACECode0
Language Models as Knowledge Bases for Visual Word Sense DisambiguationCode0
It's Not Easy Being Wrong: Large Language Models Struggle with Process of Elimination ReasoningCode0
Show:102550
← PrevPage 35 of 111Next →

No leaderboard results yet.