SOTAVerified

Multiple-choice

Papers

Showing 376400 of 1107 papers

TitleStatusHype
Evaluating Nuanced Bias in Large Language Model Free Response Answers0
GANDALF: a General Character Name Description Dataset for Long Fiction0
Evaluating Question Answering Evaluation0
Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs0
Evalita-LLM: Benchmarking Large Language Models on Italian0
BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles0
Evaluating the Potential of Leading Large Language Models in Reasoning Biology Questions0
GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis0
Establishing Task Scaling Laws via Compute-Efficient Model Ladders0
Evaluating Vision-Language and Large Language Models for Automated Student Assessment in Indonesian Classrooms0
Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration0
Evaluation of Automatically Generated Pronoun Reference Questions0
Answer Uncertainty and Unanswerability in Multiple-Choice Machine Reading Comprehension0
Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with a Focus on Candidate Response Distribution0
Examining the Behavior of LLM Architectures Within the Framework of Standardized National Exams in Brazil0
Can AI Master Construction Management (CM)? Benchmarking State-of-the-Art Large Language Models on CM Certification Exams0
EQUATOR: A Deterministic Framework for Evaluating LLM Reasoning with Open-Ended Questions. # v1.0.0-beta0
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation0
ExplanationLP: Abductive Reasoning for Explainable Science Question Answering0
Can ChatGPT pass the Vietnamese National High School Graduation Examination?0
Answering questions by learning to rank -- Learning to rank by answering questions0
Explore then Determine: A GNN-LLM Synergy Framework for Reasoning over Knowledge Graph0
Can Crowdsourcing be used for Effective Annotation of Arabic?0
Generalised Winograd Schema and its Contextuality0
Enhancing Multiple-Choice Question Answering with Causal Knowledge0
Show:102550
← PrevPage 16 of 45Next →

No leaderboard results yet.