SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–260 of 1107 papers

Title	Date	Tasks	Status	Hype
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding	Aug 17, 2023	DiagnosticEgoSchema	CodeCode Available	1
Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams	Mar 29, 2023	Multiple-choice	CodeCode Available	1
WIQA: A dataset for "What if..." reasoning over procedural text	Sep 10, 2019	Multiple-choice	CodeCode Available	1
WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation	Oct 16, 2024	BenchmarkingFairness	CodeCode Available	1
FaceXBench: Evaluating Multimodal LLMs on Face Understanding	Jan 17, 2025	FairnessMultiple-choice	CodeCode Available	1
General-Purpose Question-Answering with Macaw	Sep 6, 2021	Generative Question AnsweringMultiple-choice	CodeCode Available	1
Language Model Uncertainty Quantification with Attention Chain	Mar 24, 2025	Computational EfficiencyLanguage Modeling	CodeCode Available	1
Controlling Cloze-test Question Item Difficulty with PLM-based Surrogate Models for IRT Assessment	Mar 3, 2024	Cloze TestMultiple-choice	—Unverified	0
Contextual Response Interpretation for Automated Structured Interviews: A Case Study in Market Research	Apr 30, 2023	MarketingMultiple-choice	—Unverified	0
Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets	Sep 29, 2021	Language ModellingMachine Reading Comprehension	—Unverified	0

Show:10 25 50

← PrevPage 26 of 111Next →

No leaderboard results yet.