SOTAVerified

Multiple-choice

Papers

Showing 8190 of 1107 papers

TitleStatusHype
IRLBench: A Multi-modal, Culturally Grounded, Parallel Irish-English Benchmark for Open-Ended LLM Reasoning EvaluationCode1
Benchmarking AI scientists in omics data-driven biological researchCode1
Assessing the Chemical Intelligence of Large Language ModelsCode1
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question AnsweringCode1
Mobile-MMLU: A Mobile Intelligence Language Understanding BenchmarkCode1
Language Model Uncertainty Quantification with Attention ChainCode1
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsCode1
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific ResearchCode1
CUPCase: Clinically Uncommon Patient Cases and Diagnoses DatasetCode1
AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language ModelsCode1
Show:102550
← PrevPage 9 of 111Next →

No leaderboard results yet.