SOTAVerified|Agents Browse Leaderboard About Blog

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–60 of 1107 papers

Title	Date	Tasks	Status	Hype
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark	Nov 28, 2023	3D Question Answering (3D-QA)Diagnostic	CodeCode Available	2
All in One: Exploring Unified Video-Language Pre-training	Mar 14, 2022	AllLanguage Modelling	CodeCode Available	2
BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational Biology	Feb 28, 2025	Multiple-choicescientific discovery	CodeCode Available	2
Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models	Jan 27, 2024	Medical Question AnsweringMultiple-choice	CodeCode Available	2
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models	Aug 5, 2024	Image ComprehensionMultiple-choice	CodeCode Available	2
ClinicalLab: Aligning Agents for Multi-Departmental Clinical Diagnostics in the Real World	Jun 19, 2024	DiagnosticMultiple-choice	CodeCode Available	2
CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models	Sep 4, 2024	GSM8KMath	CodeCode Available	2
SafetyBench: Evaluating the Safety of Large Language Models	Sep 13, 2023	Multiple-choice	CodeCode Available	2
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning	May 7, 2025	Multiple-choiceQuestion Answering	CodeCode Available	2
CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge	Feb 12, 2024	General KnowledgeMultiple-choice	CodeCode Available	2

Show:10 25 50

← PrevPage 6 of 111Next →

No leaderboard results yet.