SOTAVerified|Agents Browse Leaderboard About Blog

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 1107 papers

Title	Date	Tasks	Status	Hype
MiniCPM-V: A GPT-4V Level MLLM on Your Phone	Aug 3, 2024	HallucinationMultiple-choice	CodeCode Available	12
HealthBench: Evaluating Large Language Models Towards Improved Human Health	May 13, 2025	Instruction FollowingMultiple-choice	CodeCode Available	7
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks	Dec 19, 2024	8kIn-Context Learning	CodeCode Available	5
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs	Jun 11, 2024	Multiple-choiceQuestion Answering	CodeCode Available	5
MMBench: Is Your Multi-modal Model an All-around Player?	Jul 12, 2023	AllInstruction Following	CodeCode Available	5
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation	May 20, 2025	MMEMultiple-choice	CodeCode Available	4
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens	Apr 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text	Mar 27, 2024	ArticlesLanguage Modeling	CodeCode Available	4
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning	Mar 5, 2024	Multiple-choice	CodeCode Available	4
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench	Jan 31, 2024	BenchmarkingMultiple-choice	CodeCode Available	4

Show:10 25 50

← PrevPage 1 of 111Next →

No leaderboard results yet.