SOTAVerified

Multiple-choice

Papers

Showing 326350 of 1107 papers

TitleStatusHype
AraSTEM: A Native Arabic Multiple Choice Question Benchmark for Evaluating LLMs Knowledge In STEM Subjects0
Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No!0
GeoSQA: A Benchmark for Scenario-based Question Answering in the Geography Domain at High School Level0
FAMULUS: Interactive Annotation and Feedback Generation for Teaching Diagnostic Reasoning0
Aqulia-Med LLM: Pioneering Full-Process Open-Source Medical Language Models0
Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses?0
Exposing the Limits of Video-Text Models through Contrast Sets0
FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees0
FarsEval-PKBETS: A new diverse benchmark for evaluating Persian large language models0
Can Crowdsourcing be used for Effective Annotation of Arabic?0
Can ChatGPT pass the Vietnamese National High School Graduation Examination?0
Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments0
Enhancing lexical-based approach with external knowledge for Vietnamese multiple-choice machine reading comprehension0
Can AI Master Construction Management (CM)? Benchmarking State-of-the-Art Large Language Models on CM Certification Exams0
A Joint-Reasoning based Disease Q&A System0
Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with a Focus on Candidate Response Distribution0
Answer Uncertainty and Unanswerability in Multiple-Choice Machine Reading Comprehension0
Answer Uncertainty and Unanswerability in Multiple-Choice Machine Reading Comprehension0
Bridging the Language Gap: Knowledge Injected Multilingual Question Answering0
Bridging Information-Seeking Human Gaze and Machine Reading Comprehension0
Adapting Vision-Language Models for Evaluating World Models0
How Additional Knowledge can Improve Natural Language Commonsense Question Answering?0
Fine-tuning BERT with Focus Words for Explanation Regeneration0
From Generalist to Specialist: Improving Large Language Models for Medical Physics Using ARCoT0
Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs0
Show:102550
← PrevPage 14 of 45Next →

No leaderboard results yet.