SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 111–120 of 1107 papers

Title	Date	Tasks	Status	Hype
Unmasking Deceptive Visuals: Benchmarking Multimodal Large Language Models on Misleading Chart Question Answering	Mar 23, 2025	BenchmarkingChart Question Answering	—Unverified	0
Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark	Mar 22, 2025	Multiple-choice	—Unverified	0
SaudiCulture: A Benchmark for Evaluating Large Language Models Cultural Competence within Saudi Arabia	Mar 21, 2025	Multiple-choice	—Unverified	0
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models	Mar 20, 2025	Multiple-choiceVideo Understanding	CodeCode Available	1
Fùxì: A Benchmark for Evaluating Language Models on Ancient Chinese Text Understanding and Generation	Mar 20, 2025	Multiple-choiceText Generation	CodeCode Available	0
AutoDrive-QA- Automated Generation of Multiple-Choice Questions for Autonomous Driving Datasets Using Large Vision-Language Models	Mar 20, 2025	Autonomous DrivingMultiple-choice	—Unverified	0
CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models	Mar 20, 2025	Code GenerationMultiple-choice	—Unverified	0
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding	Mar 19, 2025	BenchmarkingMultiple-choice	—Unverified	0
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models	Mar 19, 2025	Multiple-choice	—Unverified	0
How much do LLMs learn from negative examples?	Mar 18, 2025	Multiple-choiceQuestion Answering	CodeCode Available	0

Show:10 25 50

← PrevPage 12 of 111Next →

No leaderboard results yet.