SOTAVerified

Multiple-choice

Papers

Showing 10511100 of 1107 papers

TitleStatusHype
Beyond English-Only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for BulgarianCode0
A quantitative study of NLP approaches to question difficulty estimationCode0
Unified Question Answering in SloveneCode0
Neural Natural Logic Inference for Interpretable Question AnsweringCode0
Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric AnalysisCode0
FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domainCode0
Real-Time Automated Answer ScoringCode0
Automated Generation and Tagging of Knowledge Components from Multiple-Choice QuestionsCode0
LiveQA: A Question Answering Dataset over Sports LiveCode0
CASE: Commonsense-Augmented Score with an Expanded Answer SpaceCode0
Which Shortcut Solution Do Question Answering Models Prefer to Learn?Code0
From Recognition to Cognition: Visual Commonsense ReasoningCode0
FSBench: A Figure Skating Benchmark for Advancing Artistic Sports UnderstandingCode0
LLaVA-OneVision: Easy Visual Task TransferCode0
Fusing Models with Complementary ExpertiseCode0
A Benchmark for Long-Form Medical Question AnsweringCode0
Fùxì: A Benchmark for Evaluating Language Models on Ancient Chinese Text Understanding and GenerationCode0
ReCoMIF: Reading comprehension based multi-source information fusion network for Chinese spoken language understandingCode0
NLP at UC Santa Cruz at SemEval-2024 Task 5: Legal Answer Validation using Few-Shot Multi-Choice QACode0
Gendered Pronoun Resolution using BERT and an extractive question answering formulationCode0
Noise Injection Reveals Hidden Capabilities of Sandbagging Language ModelsCode0
Spoken Language Intelligence of Large Language Models for Language LearningCode0
ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision AssistantCode0
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMsCode0
Balancing Rigor and Utility: Mitigating Cognitive Biases in Large Language Models for Multiple-Choice QuestionsCode0
What Makes Reading Comprehension Questions Difficult?Code0
Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice OptionsCode0
COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSesCode0
An Information-Theoretic Approach to Analyze NLP Classification TasksCode0
World Knowledge in Multiple Choice Reading ComprehensionCode0
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language ModelsCode0
Are Large Language Models Consistent over Value-laden Questions?Code0
Revisiting Visual Question Answering BaselinesCode0
LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language ModelsCode0
BUCA: A Binary Classification Approach to Unsupervised Commonsense Question AnsweringCode0
Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language ModelsCode0
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video UnderstandingCode0
Abductive Commonsense ReasoningCode0
A Multiple Choices Reading Comprehension Corpus for Vietnamese Language EducationCode0
When an LLM is apprehensive about its answers -- and when its uncertainty is justifiedCode0
Grade Score: Quantifying LLM Performance in Option SelectionCode0
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You ThinkCode0
This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMsCode0
StoryAnalogy: Deriving Story-level Analogies from Large Language Models to Unlock Analogical UnderstandingCode0
Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document CorporaCode0
From Multiple-Choice to Extractive QA: A Case Study for English and ArabicCode0
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and ReasoningCode0
Strengthened Symbol Binding Makes Large Language Models Reliable Multiple-Choice SelectorsCode0
QMOS: Enhancing LLMs for Telecommunication with Question Masked loss and Option ShufflingCode0
Truth Knows No Language: Evaluating Truthfulness Beyond EnglishCode0
Show:102550
← PrevPage 22 of 23Next →

No leaderboard results yet.