SOTAVerified

Multiple-choice

Papers

Showing 681690 of 1107 papers

TitleStatusHype
Large Language Models Are Not Robust Multiple Choice SelectorsCode1
An Automatic Evaluation Framework for Multi-turn Medical Consultations Capabilities of Large Language Models0
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language ModelsCode1
INCEPTNET: Precise And Early Disease Detection Application For Medical Images AnalysesCode0
Generalised Winograd Schema and its Contextuality0
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language VariantsCode2
Spoken Language Intelligence of Large Language Models for Language LearningCode0
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions0
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language ModelsCode1
FinEval: A Chinese Financial Domain Knowledge Evaluation Benchmark for Large Language ModelsCode2
Show:102550
← PrevPage 69 of 111Next →

No leaderboard results yet.