SOTAVerified

Multiple-choice

Papers

Showing 10761100 of 1107 papers

TitleStatusHype
What Makes Reading Comprehension Questions Difficult?Code0
Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice OptionsCode0
COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSesCode0
An Information-Theoretic Approach to Analyze NLP Classification TasksCode0
World Knowledge in Multiple Choice Reading ComprehensionCode0
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language ModelsCode0
Are Large Language Models Consistent over Value-laden Questions?Code0
Revisiting Visual Question Answering BaselinesCode0
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language ModelsCode0
BUCA: A Binary Classification Approach to Unsupervised Commonsense Question AnsweringCode0
Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language ModelsCode0
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video UnderstandingCode0
Abductive Commonsense ReasoningCode0
A Multiple Choices Reading Comprehension Corpus for Vietnamese Language EducationCode0
When an LLM is apprehensive about its answers -- and when its uncertainty is justifiedCode0
Grade Score: Quantifying LLM Performance in Option SelectionCode0
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You ThinkCode0
This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMsCode0
StoryAnalogy: Deriving Story-level Analogies from Large Language Models to Unlock Analogical UnderstandingCode0
Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document CorporaCode0
From Multiple-Choice to Extractive QA: A Case Study for English and ArabicCode0
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and ReasoningCode0
Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice SelectorsCode0
QMOS: Enhancing LLMs for Telecommunication with Question Masked loss and Option ShufflingCode0
Truth Knows No Language: Evaluating Truthfulness Beyond EnglishCode0
Show:102550
← PrevPage 44 of 45Next →

No leaderboard results yet.