SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 311–320 of 1107 papers

Title	Date	Tasks	Status	Hype
Adaptive Wizard for Removing Cross-Tier Misconfigurations in Active Directory	May 2, 2025	Multiple-choice	—Unverified	0
Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks	Nov 9, 2023	Multiple-choiceWorld Knowledge	—Unverified	0
Changing Answer Order Can Decrease MMLU Accuracy	Jun 27, 2024	MMLUMultiple-choice	—Unverified	0
Evaluating Question Answering Evaluation	Nov 1, 2019	Answer GenerationMultiple-choice	—Unverified	0
Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation	May 15, 2025	InformativenessMultiple-choice	—Unverified	0
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding	Dec 16, 2024	HallucinationMultiple-choice	—Unverified	0
Adaptive Crowdsourcing Algorithms for the Bandit Survey Problem	Feb 13, 2013	Information RetrievalMultiple-choice	—Unverified	0
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models	Jul 2, 2024	Multiple-choice	—Unverified	0
Evaluating multiple large language models in pediatric ophthalmology	Nov 7, 2023	Multiple-choice	—Unverified	0
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy	Oct 17, 2024	Multiple-choiceResponse Generation	—Unverified	0

Show:10 25 50

← PrevPage 32 of 111Next →

No leaderboard results yet.