SOTAVerified

Multiple-choice

Papers

Showing 10511075 of 1107 papers

TitleStatusHype
Beyond English-Only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for BulgarianCode0
A quantitative study of NLP approaches to question difficulty estimationCode0
Unified Question Answering in SloveneCode0
Neural Natural Logic Inference for Interpretable Question AnsweringCode0
Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric AnalysisCode0
FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domainCode0
Real-Time Automated Answer ScoringCode0
Automated Generation and Tagging of Knowledge Components from Multiple-Choice QuestionsCode0
LiveQA: A Question Answering Dataset over Sports LiveCode0
CASE: Commonsense-Augmented Score with an Expanded Answer SpaceCode0
Which Shortcut Solution Do Question Answering Models Prefer to Learn?Code0
From Recognition to Cognition: Visual Commonsense ReasoningCode0
FSBench: A Figure Skating Benchmark for Advancing Artistic Sports UnderstandingCode0
LLaVA-OneVision: Easy Visual Task TransferCode0
Fusing Models with Complementary ExpertiseCode0
A Benchmark for Long-Form Medical Question AnsweringCode0
Fùxì: A Benchmark for Evaluating Language Models on Ancient Chinese Text Understanding and GenerationCode0
ReCoMIF: Reading comprehension based multi-source information fusion network for Chinese spoken language understandingCode0
NLP at UC Santa Cruz at SemEval-2024 Task 5: Legal Answer Validation using Few-Shot Multi-Choice QACode0
Gendered Pronoun Resolution using BERT and an extractive question answering formulationCode0
Noise Injection Reveals Hidden Capabilities of Sandbagging Language ModelsCode0
Spoken Language Intelligence of Large Language Models for Language LearningCode0
ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision AssistantCode0
LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMsCode0
Balancing Rigor and Utility: Mitigating Cognitive Biases in Large Language Models for Multiple-Choice QuestionsCode0
Show:102550
← PrevPage 43 of 45Next →

No leaderboard results yet.