SOTAVerified

Multiple-choice

Papers

Showing 411420 of 1107 papers

TitleStatusHype
FactTest: Factuality Testing in Large Language Models with Finite-Sample and Distribution-Free Guarantees0
Enhancing LLM Evaluations: The Garbling Trick0
FAMULUS: Interactive Annotation and Feedback Generation for Teaching Diagnostic Reasoning0
FarsEval-PKBETS: A new diverse benchmark for evaluating Persian large language models0
Answering Chinese Elementary School Social Study Multiple Choice Questions0
Enhancing Event Causality Identification with Rationale and Structure-Aware Causal Question Answering0
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding0
Enhancing Distractor Generation for Multiple-Choice Questions with Retrieval Augmented Pretraining and Knowledge Graph Integration0
Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models0
Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination0
Show:102550
← PrevPage 42 of 111Next →

No leaderboard results yet.