SOTAVerified

Multiple-choice

Papers

Showing 101125 of 1107 papers

TitleStatusHype
ArabicMMLU: Assessing Massive Multitask Language Understanding in ArabicCode1
GPT Takes the Bar ExamCode1
Clues Before Answers: Generation-Enhanced Multiple-Choice QACode1
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and ReasoningCode1
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question AnsweringCode1
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
LifeQA: A Real-life Dataset for Video Question AnsweringCode1
CommonsenseQA: A Question Answering Challenge Targeting Commonsense KnowledgeCode1
Can large language models reason about medical questions?Code1
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-trainingCode1
Filter-then-Generate: Large Language Models with Structure-Text Adapter for Knowledge Graph CompletionCode1
ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive SummarizationCode1
FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain DialogueCode1
Conformal Prediction with Large Language Models for Multi-Choice Question AnsweringCode1
CUPCase: Clinically Uncommon Patient Cases and Diagnoses DatasetCode1
Counterfactual Variable Control for Robust and Interpretable Question AnsweringCode1
IRLBench: A Multi-modal, Culturally Grounded, Parallel Irish-English Benchmark for Open-Ended LLM Reasoning EvaluationCode1
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food CultureCode1
A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training StrategiesCode1
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across ModalitiesCode1
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language ModelsCode1
Assessing the Chemical Intelligence of Large Language ModelsCode1
Leaf: Multiple-Choice Question GenerationCode1
Boosting Healthcare LLMs Through Retrieved ContextCode1
A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense ReasoningCode1
Show:102550
← PrevPage 5 of 45Next →

No leaderboard results yet.