SOTAVerified

Multiple-choice

Papers

Showing 451460 of 1107 papers

TitleStatusHype
MuirBench: A Comprehensive Benchmark for Robust Multi-image UnderstandingCode1
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in InsuranceCode1
OLMES: A Standard for Language Model Evaluations0
Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and ArenaCode2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMsCode5
BertaQA: How Much Do Language Models Know About Local Culture?Code0
Towards a Personal Health Large Language Model0
Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context0
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation0
Do LLMs Recognize me, When I is not me: Assessment of LLMs Understanding of Turkish Indexical Pronouns in Indexical Shift Contexts0
Show:102550
← PrevPage 46 of 111Next →

No leaderboard results yet.