SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 171–180 of 1107 papers

Title	Date	Tasks	Status	Hype
Boosting Healthcare LLMs Through Retrieved Context	Sep 23, 2024	BenchmarkingMultiple-choice	CodeCode Available	1
Is Bigger and Deeper Always Better? Probing LLaMA Across Scales and Layers	Dec 7, 2023	MathMultiple-choice	CodeCode Available	1
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance	Jun 13, 2024	Multiple-choiceVisual Reasoning	CodeCode Available	1
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce	Jun 14, 2024	Multiple-choiceQuestion Answering	CodeCode Available	1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding	Aug 17, 2023	DiagnosticEgoSchema	CodeCode Available	1
Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams	Mar 29, 2023	Multiple-choice	CodeCode Available	1
BiMediX: Bilingual Medical Mixture of Experts LLM	Feb 20, 2024	Mixture-of-ExpertsMultiple-choice	CodeCode Available	1
Language Model Uncertainty Quantification with Attention Chain	Mar 24, 2025	Computational EfficiencyLanguage Modeling	CodeCode Available	1
Explicit Planning Helps Language Models in Logical Reasoning	Mar 28, 2023	Logical ReasoningMultiple-choice	CodeCode Available	1
Delving into the Reversal Curse: How Far Can Large Language Models Generalize?	Oct 24, 2024	Multiple-choice	CodeCode Available	1

Show:10 25 50

← PrevPage 18 of 111Next →

No leaderboard results yet.