SOTAVerified

Multiple-choice

Papers

Showing 201225 of 1107 papers

TitleStatusHype
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model EvaluationCode1
Leveraging Large Language Models for Learning Complex Legal Concepts through StorytellingCode1
ArabicMMLU: Assessing Massive Multitask Language Understanding in ArabicCode1
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food CultureCode1
Constructing Narrative Event Evolutionary Graph for Script Event PredictionCode1
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language ModelsCode1
Latxa: An Open Language Model and Evaluation Suite for BasqueCode1
A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies.Code1
AutoLogi: Automated Generation of Logic Puzzles for Evaluating Reasoning Abilities of Large Language ModelsCode1
CC-Riddle: A Question Answering Dataset of Chinese Character RiddlesCode1
Conformal Prediction with Large Language Models for Multi-Choice Question AnsweringCode1
Leaf: Multiple-Choice Question GenerationCode1
General-Purpose Question-Answering with MacawCode1
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language ModelsCode1
Leveraging Large Language Models for Multiple Choice Question AnsweringCode1
GPT as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA CapabilitiesCode1
CommonsenseQA: A Question Answering Challenge Targeting Commonsense KnowledgeCode1
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-trainingCode1
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingCode1
Complex Reasoning over Logical Queries on Commonsense Knowledge GraphsCode1
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language ModelsCode1
HCQA @ Ego4D EgoSchema Challenge 2024Code1
Knowledge Graph-Augmented Abstractive Summarization with Semantic-Driven Cloze RewardCode1
Language Model Uncertainty Quantification with Attention ChainCode1
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and ReasoningCode1
Show:102550
← PrevPage 9 of 45Next →

No leaderboard results yet.