SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 271–280 of 1107 papers

Title	Date	Tasks	Status	Hype
Different Questions, Different Models: Fine-Grained Evaluation of Uncertainty and Calibration in Clinical QA with LLMs	Jun 12, 2025	Multiple-choiceQuestion Answering	—Unverified	0
A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs	Jun 11, 2025	Multiple-choice	—Unverified	0
VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks	Jun 10, 2025	Multiple-choiceOpen-Ended Question Answering	—Unverified	0
ARGUS: Hallucination and Omission Evaluation in Video-LLMs	Jun 9, 2025	DescriptiveForm	—Unverified	0
Evaluating LLM-corrupted Crowdsourcing Data Without Ground Truth	Jun 8, 2025	Multiple-choice	—Unverified	0
Evaluating Vision-Language and Large Language Models for Automated Student Assessment in Indonesian Classrooms	Jun 5, 2025	Multiple-choice	—Unverified	0
Multiple-Choice Question Generation Using Large Language Models: Methodology and Educator Insights	Jun 5, 2025	Multiple-choiceQuestion Answering	—Unverified	0
Do Large Language Models Know Folktales? A Case Study of Yokai in Japanese Folktales	Jun 4, 2025	Multiple-choice	—Unverified	0
Performance of leading large language models in May 2025 in Membership of the Royal College of General Practitioners-style examination questions: a cross-sectional analysis	Jun 3, 2025	Multiple-choice	—Unverified	0
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation	Jun 2, 2025	Multiple-choiceQuestion Answering	—Unverified	0

Show:10 25 50

← PrevPage 28 of 111Next →

No leaderboard results yet.