SOTAVerified|Agents Browse Leaderboard About Blog

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 1107 papers

Title	Date	Tasks	Status	Hype
Evaluating Vision-Language and Large Language Models for Automated Student Assessment in Indonesian Classrooms	Jun 5, 2025	Multiple-choice	—Unverified	0
LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation	Jun 4, 2025	Multiple-choice	CodeCode Available	1
Do Large Language Models Know Folktales? A Case Study of Yokai in Japanese Folktales	Jun 4, 2025	Multiple-choice	—Unverified	0
Performance of leading large language models in May 2025 in Membership of the Royal College of General Practitioners-style examination questions: a cross-sectional analysis	Jun 3, 2025	Multiple-choice	—Unverified	0
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation	Jun 2, 2025	Multiple-choiceQuestion Answering	—Unverified	0
Polishing Every Facet of the GEM: Testing Linguistic Competence of LLMs and Humans in Korean	Jun 2, 2025	Multiple-choice	CodeCode Available	1
PersianMedQA: Language-Centric Evaluation of LLMs in the Persian Medical Domain	May 30, 2025	Instruction FollowingMultiple-choice	—Unverified	0
VUDG: A Dataset for Video Understanding Domain Generalization	May 30, 2025	Domain GeneralizationMultiple-choice	—Unverified	0
ClinBench-HPB: A Clinical Benchmark for Evaluating LLMs in Hepato-Pancreato-Biliary Diseases	May 30, 2025	Medical Question AnsweringMultiple-choice	—Unverified	0
Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization	May 30, 2025	FormLanguage Modeling	—Unverified	0

Show:10 25 50

← PrevPage 3 of 111Next →

No leaderboard results yet.