SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 241–250 of 1107 papers

Title	Date	Tasks	Status	Hype
MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark	Dec 19, 2024	MMLUMultiple-choice	CodeCode Available	2
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks	Dec 19, 2024	8kIn-Context Learning	CodeCode Available	5
Are You Doubtful? Oh, It Might Be Difficult Then! Exploring the Use of Model Uncertainty for Question Difficulty Estimation	Dec 16, 2024	Multiple-choice	—Unverified	0
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding	Dec 16, 2024	HallucinationMultiple-choice	—Unverified	0
Auto-bidding in real-time auctions via Oracle Imitation Learning (OIL)	Dec 16, 2024	Imitation LearningMultiple-choice	—Unverified	0
Seeing the Forest and the Trees: Solving Visual Graph and Tree Based Data Structure Problems using Large Multimodal Models	Dec 15, 2024	Multiple-choice	—Unverified	0
MedG-KRP: Medical Graph Knowledge Representation Probing	Dec 14, 2024	Multiple-choiceMultiple Choice Question Answering (MCQA)	CodeCode Available	0
Do LLMs Act as Repositories of Causal Knowledge?	Dec 14, 2024	Causal InferenceMultiple-choice	—Unverified	0
A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options	Dec 14, 2024	Multiple-choice	—Unverified	0
Superhuman performance of a large language model on the reasoning tasks of a physician	Dec 14, 2024	DiagnosticLanguage Modeling	—Unverified	0

Show:10 25 50

← PrevPage 25 of 111Next →

No leaderboard results yet.