SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 311–320 of 1107 papers

Title	Date	Tasks	Status	Hype
ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training	May 16, 2025	Multiple-choicetext-classification	—Unverified	0
MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models	May 16, 2025	BenchmarkingDecision Making	—Unverified	0
Are LLM-generated plain language summaries truly understandable? A large-scale crowdsourced evaluation	May 15, 2025	InformativenessMultiple-choice	—Unverified	0
The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think	May 15, 2025	Multiple-choice	—Unverified	0
SafePath: Conformal Prediction for Safe LLM-Based Autonomous Navigation	May 14, 2025	Autonomous DrivingAutonomous Navigation	—Unverified	0
KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive Reasoning	May 14, 2025	BenchmarkingMMLU	—Unverified	0
Grounding Synthetic Data Evaluations of Language Models in Unsupervised Document Corpora	May 13, 2025	BenchmarkingDiagnostic	CodeCode Available	0
VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models	May 13, 2025	FormMultiple-choice	CodeCode Available	0
How well do LLMs reason over tabular data, really?	May 12, 2025	Missing ValuesMultiple-choice	—Unverified	0
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information	May 9, 2025	BenchmarkingForm	—Unverified	0

Show:10 25 50

← PrevPage 32 of 111Next →

No leaderboard results yet.