SOTAVerified

Multiple-choice

Papers

Showing 141150 of 1107 papers

TitleStatusHype
Multiple-Choice Questions are Efficient and Robust LLM EvaluatorsCode1
SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure InterpretationCode1
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language ModelsCode1
Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcomCode1
Latxa: An Open Language Model and Evaluation Suite for BasqueCode1
Non-Linear Inference Time Intervention: Improving LLM TruthfulnessCode1
IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language ModelsCode1
Complex Reasoning over Logical Queries on Commonsense Knowledge GraphsCode1
Unfamiliar Finetuning Examples Control How Language Models HallucinateCode1
To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question AnsweringCode1
Show:102550
← PrevPage 15 of 111Next →

No leaderboard results yet.