SOTAVerified|Agents Browse Leaderboard About

Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 81–90 of 1107 papers

Title	Date	Tasks	Status	Hype
IRLBench: A Multi-modal, Culturally Grounded, Parallel Irish-English Benchmark for Open-Ended LLM Reasoning Evaluation	May 16, 2025	Multiple-choice	CodeCode Available	1
Benchmarking AI scientists in omics data-driven biological research	May 13, 2025	BenchmarkingMultiple-choice	CodeCode Available	1
Assessing the Chemical Intelligence of Large Language Models	May 12, 2025	Multiple-choice	CodeCode Available	1
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering	Apr 7, 2025	Chart Question AnsweringChart Understanding	CodeCode Available	1
Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark	Mar 26, 2025	MMLUMultiple-choice	CodeCode Available	1
Language Model Uncertainty Quantification with Attention Chain	Mar 24, 2025	Computational EfficiencyLanguage Modeling	CodeCode Available	1
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models	Mar 20, 2025	Multiple-choiceVideo Understanding	CodeCode Available	1
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research	Mar 17, 2025	ArticlesBenchmarking	CodeCode Available	1
CUPCase: Clinically Uncommon Patient Cases and Diagnoses Dataset	Mar 8, 2025	Multiple-choice	CodeCode Available	1
AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language Models	Feb 24, 2025	Logical ReasoningMultiple-choice	CodeCode Available	1

Show:10 25 50

← PrevPage 9 of 111Next →

No leaderboard results yet.