SOTAVerified

Multiple-choice

Papers

Showing 651660 of 1107 papers

TitleStatusHype
Predicting the Difficulty of Multiple Choice Questions in a High-stakes Medical Exam0
Predictions from language models for multiple-choice tasks are not robust under variation of scoring methods0
Probabilistic Consensus through Ensemble Validation: A Framework for LLM Reliability0
Prompt Engineering and Calibration for Zero-Shot Commonsense Reasoning0
Prompting Implicit Discourse Relation Annotation0
Instruction Fine-Tuning: Does Prompt Loss Matter?0
ProverbEval: Exploring LLM Evaluation Challenges for Low-resource Language Understanding0
ConceptPsy:A Benchmark Suite with Conceptual Comprehensiveness in Psychology0
PUB: A Pragmatics Understanding Benchmark for Assessing LLMs' Pragmatics Capabilities0
Q-Bench-Video: Benchmarking the Video Quality Understanding of LMMs0
Show:102550
← PrevPage 66 of 111Next →

No leaderboard results yet.