SOTAVerified

Multiple-choice

Papers

Showing 191200 of 1107 papers

TitleStatusHype
Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam DatasetCode1
Conformal Prediction with Large Language Models for Multi-Choice Question AnsweringCode1
NarrativeXL: A Large-scale Dataset For Long-Term Memory ModelsCode1
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language ModelsCode1
M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language ModelsCode1
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingCode1
MindGames: Targeting Theory of Mind in Large Language Models with Dynamic Epistemic Modal LogicCode1
Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission ExamsCode1
Explicit Planning Helps Language Models in Logical ReasoningCode1
Long Horizon Temperature ScalingCode1
Show:102550
← PrevPage 20 of 111Next →

No leaderboard results yet.