SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 331–340 of 1107 papers

Title	Date	Tasks	Status	Hype
LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception	Apr 21, 2025	MathMMLU	—Unverified	0
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding	Apr 20, 2025	Autonomous DrivingImage Captioning	CodeCode Available	0
FarsEval-PKBETS: A new diverse benchmark for evaluating Persian large language models	Apr 20, 2025	DescriptiveEthics	—Unverified	0
Assessing AI-Generated Questions' Alignment with Cognitive Frameworks in Educational Assessment	Apr 19, 2025	ClassificationMultiple-choice	—Unverified	0
DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain	Apr 18, 2025	Multiple-choice	—Unverified	0
D-GEN: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Model	Apr 18, 2025	Distractor GenerationMultiple-choice	—Unverified	0
Benchmarking Next-Generation Reasoning-Focused Large Language Models in Ophthalmology: A Head-to-Head Evaluation on 5,888 Items	Apr 15, 2025	BenchmarkingMultiple-choice	—Unverified	0
AgMMU: A Comprehensive Agricultural Multimodal Understanding and Reasoning Benchmark	Apr 14, 2025	ManagementMultiple-choice	—Unverified	0
Large Language Models Could Be Rote Learners	Apr 11, 2025	MemorizationMMLU	—Unverified	0
Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation	Apr 9, 2025	Multiple-choice	CodeCode Available	0

Show:10 25 50

← PrevPage 34 of 111Next →

No leaderboard results yet.