SOTAVerified

Multiple-choice

Papers

Showing 101125 of 1107 papers

TitleStatusHype
ArabicMMLU: Assessing Massive Multitask Language Understanding in ArabicCode1
FaceXBench: Evaluating Multimodal LLMs on Face UnderstandingCode1
Explicit Planning Helps Language Models in Logical ReasoningCode1
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language ModelsCode1
AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading ComprehensionCode1
Evaluating the Knowledge Dependency of QuestionsCode1
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 LanguagesCode1
From Machine Reading Comprehension to Dialogue State Tracking: Bridging the GapCode1
Generating Distractors for Reading Comprehension Questions from Real ExaminationsCode1
FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain DialogueCode1
HCQA @ Ego4D EgoSchema Challenge 2024Code1
LifeQA: A Real-life Dataset for Video Question AnsweringCode1
EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language UnderstandingCode1
E-EVAL: A Comprehensive Chinese K-12 Education Evaluation Benchmark for Large Language ModelsCode1
Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset and Comprehensive FrameworkCode1
Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcomCode1
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in InsuranceCode1
A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training StrategiesCode1
JMedLoRA:Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuningCode1
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingCode1
Assessing the Chemical Intelligence of Large Language ModelsCode1
EduQG: A Multi-format Multiple Choice Dataset for the Educational DomainCode1
Leaf: Multiple-Choice Question GenerationCode1
Enhancing Knowledge Tracing with Concept Map and Response DisentanglementCode1
Show:102550
← PrevPage 5 of 45Next →

No leaderboard results yet.