SOTAVerified

Multiple-choice

Papers

Showing 151160 of 1107 papers

TitleStatusHype
ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense ReasoningCode1
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?Code1
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language ModelsCode1
Explicit Planning Helps Language Models in Logical ReasoningCode1
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object ClassificationCode1
Benchmarking AI scientists in omics data-driven biological researchCode1
An MRC Framework for Semantic Role LabelingCode1
Benchmarking Large Language Models on Answering and Explaining Challenging Medical QuestionsCode1
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language ModelsCode1
EduQG: A Multi-format Multiple Choice Dataset for the Educational DomainCode1
Show:102550
← PrevPage 16 of 111Next →

No leaderboard results yet.