SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–460 of 1107 papers

Title	Date	Tasks	Status	Hype
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding	Jun 13, 2024	Multiple-choiceScene Understanding	CodeCode Available	1
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance	Jun 13, 2024	Multiple-choiceVisual Reasoning	CodeCode Available	1
OLMES: A Standard for Language Model Evaluations	Jun 12, 2024	Language ModelingLanguage Modelling	—Unverified	0
Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena	Jun 11, 2024	Multiple-choiceSelection bias	CodeCode Available	2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs	Jun 11, 2024	Multiple-choiceQuestion Answering	CodeCode Available	5
BertaQA: How Much Do Language Models Know About Local Culture?	Jun 11, 2024	Multiple-choiceTransfer Learning	CodeCode Available	0
Towards a Personal Health Large Language Model	Jun 10, 2024	Language ModelingLanguage Modelling	—Unverified	0
Decision-Making Behavior Evaluation Framework for LLMs under Uncertain Context	Jun 10, 2024	Decision MakingMultiple-choice	—Unverified	0
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation	Jun 8, 2024	Abstractive Text SummarizationDialogue Generation	—Unverified	0
Do LLMs Recognize me, When I is not me: Assessment of LLMs Understanding of Turkish Indexical Pronouns in Indexical Shift Contexts	Jun 8, 2024	Machine TranslationMultiple-choice	—Unverified	0

Show:10 25 50

← PrevPage 46 of 111Next →

No leaderboard results yet.