SOTAVerified

Multiple-choice

Papers

Showing 271280 of 1107 papers

TitleStatusHype
Different Questions, Different Models: Fine-Grained Evaluation of Uncertainty and Calibration in Clinical QA with LLMs0
A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs0
VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks0
ARGUS: Hallucination and Omission Evaluation in Video-LLMs0
Evaluating LLM-corrupted Crowdsourcing Data Without Ground Truth0
Evaluating Vision-Language and Large Language Models for Automated Student Assessment in Indonesian Classrooms0
Multiple-Choice Question Generation Using Large Language Models: Methodology and Educator Insights0
Do Large Language Models Know Folktales? A Case Study of Yokai in Japanese Folktales0
Performance of leading large language models in May 2025 in Membership of the Royal College of General Practitioners-style examination questions: a cross-sectional analysis0
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation0
Show:102550
← PrevPage 28 of 111Next →

No leaderboard results yet.