SOTAVerified

Multiple-choice

Papers

Showing 561570 of 1107 papers

TitleStatusHype
SportQA: A Benchmark for Sports Understanding in Large Language ModelsCode1
Biomedical Entity Linking as Multiple Choice Question AnsweringCode0
ToMBench: Benchmarking Theory of Mind in Large Language ModelsCode2
tinyBenchmarks: evaluating LLMs with fewer examplesCode2
Uncertainty-Aware Evaluation for Vision-Language ModelsCode1
Identifying Multiple Personalities in Large Language Models with External Evaluation0
"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language ModelsCode0
Ranking Large Language Models without Ground Truth0
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models0
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge0
Show:102550
← PrevPage 57 of 111Next →

No leaderboard results yet.