Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 1107 papers

Title	Date	Tasks	Status	Hype	Score
ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic	Feb 20, 2024	ArabicMMLULanguage Model Evaluation	CodeCode Available	1	5
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting	May 7, 2023	Multiple-choice	CodeCode Available	1	5
IRLBench: A Multi-modal, Culturally Grounded, Parallel Irish-English Benchmark for Open-Ended LLM Reasoning Evaluation	May 16, 2025	Multiple-choice	CodeCode Available	1	5
Delving into the Reversal Curse: How Far Can Large Language Models Generalize?	Oct 24, 2024	Multiple-choice	CodeCode Available	1	5
AdaLoGN: Adaptive Logic Graph Network for Reasoning-Based Machine Reading Comprehension	Mar 16, 2022	Logical ReasoningMachine Reading Comprehension	CodeCode Available	1	5
Let Androids Dream of Electric Sheep: A Human-like Image Implication Understanding and Reasoning Framework	May 22, 2025	Multiple-choiceVisual Question Answering (VQA)	CodeCode Available	1	5
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages	Nov 25, 2024	AllLong Question Answer	CodeCode Available	1	5
LibriSQA: A Novel Dataset and Framework for Spoken Question Answering with Large Language Models	Aug 20, 2023	Multiple-choiceQuestion Answering	CodeCode Available	1	5
LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation	Jun 4, 2025	Multiple-choice	CodeCode Available	1	5
LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs	Aug 16, 2024	Instruction FollowingMultiple-choice	CodeCode Available	1	5
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models	Nov 10, 2023	GSM8KMemorization	CodeCode Available	1	5
ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive Summarization	Sep 9, 2021	Abstractive Text SummarizationDecoder	CodeCode Available	1	5
Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams	Mar 29, 2023	Multiple-choice	CodeCode Available	1	5
Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom	Apr 30, 2024	ImplicaturesMultiple-choice	CodeCode Available	1	5
Latxa: An Open Language Model and Evaluation Suite for Basque	Mar 29, 2024	Language ModelingLanguage Modelling	CodeCode Available	1	5
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models	Nov 27, 2024	BenchmarkingEarth Observation	CodeCode Available	1	5
EduQG: A Multi-format Multiple Choice Dataset for the Educational Domain	Oct 12, 2022	Distractor GenerationMultiple-choice	CodeCode Available	1	5
Constructing Narrative Event Evolutionary Graph for Script Event Prediction	May 14, 2018	Graph Neural NetworkMultiple-choice	CodeCode Available	1	5
Conformal Prediction with Large Language Models for Multi-Choice Question Answering	May 28, 2023	Conformal PredictionMultiple-choice	CodeCode Available	1	5
Complex Reasoning over Logical Queries on Commonsense Knowledge Graphs	Mar 12, 2024	Knowledge GraphsMultiple-choice	CodeCode Available	1	5
Counterfactual Variable Control for Robust and Interpretable Question Answering	Oct 12, 2020	Causal Inferencecounterfactual	CodeCode Available	1	5
Assessing the Chemical Intelligence of Large Language Models	May 12, 2025	Multiple-choice	CodeCode Available	1	5
Enhancing Knowledge Tracing with Concept Map and Response Disentanglement	Aug 23, 2024	DisentanglementKnowledge Tracing	CodeCode Available	1	5
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research	Mar 17, 2025	ArticlesBenchmarking	CodeCode Available	1	5
Estimating Contamination via Perplexity: Quantifying Memorisation in Language Model Evaluation	Sep 19, 2023	Language Model EvaluationLanguage Modeling	CodeCode Available	1	5
MindGames: Targeting Theory of Mind in Large Language Models with Dynamic Epistemic Modal Logic	May 5, 2023	Epistemic ReasoningLanguage Modeling	CodeCode Available	1	5
IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages	Nov 8, 2020	Genre classificationMultiple-choice	CodeCode Available	1	5
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training	Jun 15, 2024	Domain AdaptationLanguage Modeling	CodeCode Available	1	5
A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning	Oct 1, 2024	Common Sense ReasoningDeepFake Detection	CodeCode Available	1	5
IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models	Mar 23, 2024	Common Sense ReasoningIn-Context Learning	CodeCode Available	1	5
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models	Sep 5, 2023	Code GenerationMultiple-choice	CodeCode Available	1	5
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge	Nov 2, 2018	Common Sense ReasoningMultiple-choice	CodeCode Available	1	5
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding	Jun 28, 2024	Multiple-choiceVideo Understanding	CodeCode Available	1	5
Clues Before Answers: Generation-Enhanced Multiple-Choice QA	Apr 30, 2022	DecoderMultiple-choice	CodeCode Available	1	5
Ranked Voting based Self-Consistency of Large Language Models	May 16, 2025	Multiple-choiceOpen-Ended Question Answering	CodeCode Available	1	5
An Open Source Data Contamination Report for Large Language Models	Oct 26, 2023	HellaSwagLanguage Modeling	CodeCode Available	1	5
Annealed Winner-Takes-All for Motion Forecasting	Sep 17, 2024	AllAutonomous Driving	CodeCode Available	1	5
ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering	Apr 7, 2025	Chart Question AnsweringChart Understanding	CodeCode Available	1	5
Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing	Jul 22, 2024	AllDiversity	CodeCode Available	1	5
CMMU: A Benchmark for Chinese Multi-modal Multi-type Question Understanding and Reasoning	Jan 25, 2024	Multiple-choicePosition	CodeCode Available	1	5
HCQA @ Ego4D EgoSchema Challenge 2024	Jun 22, 2024	Caption Generation	CodeCode Available	1	5
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models	Mar 20, 2025	Multiple-choiceVideo Understanding	CodeCode Available	1	5
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance	Jun 13, 2024	Multiple-choiceVisual Reasoning	CodeCode Available	1	5
An MRC Framework for Semantic Role Labeling	Sep 14, 2021	Computational EfficiencyMachine Reading Comprehension	CodeCode Available	1	5
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification	Jun 20, 2024	BenchmarkingClassification	CodeCode Available	1	5
An In-depth Look at Gemini's Language Abilities	Dec 18, 2023	Instruction FollowingMath	CodeCode Available	1	5
General-Purpose Question-Answering with Macaw	Sep 6, 2021	Generative Question AnsweringMultiple-choice	CodeCode Available	1	5
Generating Distractors for Reading Comprehension Questions from Real Examinations	Sep 8, 2018	DecoderDistractor Generation	CodeCode Available	1	5
GPT as Knowledge Worker: A Zero-Shot Evaluation of (AI)CPA Capabilities	Jan 11, 2023	Multiple-choice	CodeCode Available	1	5
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture	Jun 16, 2024	DiversityMultiple-choice	CodeCode Available	1	5

Show:10 25 50

← PrevPage 3 of 23Next →

No leaderboard results yet.