SOTAVerified|Agents Browse Leaderboard About Blog

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 41–50 of 1107 papers

Title	Date	Tasks	Status	Hype
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding	Jul 22, 2024	Multiple-choiceQuestion Answering	CodeCode Available	2
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity	Jul 22, 2024	DiversityMultiple-choice	CodeCode Available	2
MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding	Jul 6, 2024	ArticlesInstruction Following	CodeCode Available	2
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World	Jun 19, 2024	DiagnosticMultiple-choice	CodeCode Available	2
CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models	Jun 14, 2024	Multiple-choiceQuestion Answering	CodeCode Available	2
Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena	Jun 11, 2024	Multiple-choiceSelection bias	CodeCode Available	2
Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation	May 22, 2024	InformativenessLanguage Modeling	CodeCode Available	2
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance	May 5, 2024	Multiple-choice	CodeCode Available	2
PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games	Apr 26, 2024	Decision MakingLanguage Modeling	CodeCode Available	2
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM	Mar 27, 2024	Language ModelingLanguage Modelling	CodeCode Available	2

Show:10 25 50

← PrevPage 5 of 111Next →

No leaderboard results yet.