SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 171–180 of 1107 papers

Title	Date	Tasks	Status	Hype
Mind the Confidence Gap: Overconfidence, Calibration, and Distractor Effects in Large Language Models	Feb 16, 2025	Multiple-choice	CodeCode Available	1
LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning	Feb 16, 2025	Analogical questionsIn-Context Learning	—Unverified	0
VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models	Feb 14, 2025	Image CaptioningLarge Language Model	—Unverified	0
Objective quantification of mood states using large language models	Feb 13, 2025	Multiple-choice	—Unverified	0
Truth Knows No Language: Evaluating Truthfulness Beyond English	Feb 13, 2025	InformativenessMachine Translation	CodeCode Available	0
SB-Bench: Stereotype Bias Benchmark for Large Multimodal Models	Feb 12, 2025	FairnessMultiple-choice	—Unverified	0
A Semantic Parsing Algorithm to Solve Linear Ordering Problems	Feb 12, 2025	Multiple-choiceSemantic Parsing	—Unverified	0
Break the Checkbox: Challenging Closed-Style Evaluations of Cultural Alignment in LLMs	Feb 12, 2025	Multiple-choiceSurvey	—Unverified	0
PerCul: A Story-Driven Cultural Evaluation of LLMs in Persian	Feb 11, 2025	Multiple-choice	—Unverified	0
Tokenization Standards for Linguistic Integrity: Turkish as a Benchmark	Feb 10, 2025	MMLUMorphological Analysis	—Unverified	0

Show:10 25 50

← PrevPage 18 of 111Next →

No leaderboard results yet.