Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1051–1100 of 1107 papers

Title	Date	Tasks	Status
How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?	Jun 19, 2025	Multiple-choiceQuestion Answering	—Unverified
How Many Workers to Ask? Adaptive Exploration for Collecting High Quality Labels	Nov 1, 2014	Multiple-choice	—Unverified
How Susceptible are LLMs to Influence in Prompts?	Aug 17, 2024	Multiple-choiceQuestion Answering	—Unverified
How well do LLMs reason over tabular data, really?	May 12, 2025	Missing ValuesMultiple-choice	—Unverified
HRCA+: Advanced Multiple-choice Machine Reading Comprehension Method	Jun 1, 2022	Machine Reading ComprehensionMultiple-choice	—Unverified
Humanity's Last Exam	Jan 24, 2025	Humanity's Last ExamLanguage Modeling	—Unverified
Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators	Nov 8, 2024	Decision MakingMultiple-choice	—Unverified
Hypothesis Testing for Quantifying LLM-Human Misalignment in Multiple Choice Settings	Jun 17, 2025	Decision MakingLanguage Modeling	—Unverified
Identification of mental fatigue in language comprehension tasks based on EEG and deep learning	Apr 14, 2021	ClassificationEEG	—Unverified
Treatment Effects with Multidimensional Unobserved Heterogeneity: Identification of the Marginal Treatment Effect	Sep 23, 2022	Multiple-choice	—Unverified
Identifying Multiple Personalities in Large Language Models with External Evaluation	Feb 22, 2024	Multiple-choice	—Unverified
Identity Lock: Locking API Fine-tuned LLMs With Identity-based Wake Words	Mar 10, 2025	Multiple-choice	—Unverified
IIE-NLP-Eyas at SemEval-2021 Task 4: Enhancing PLM for ReCAM with Special Tokens, Re-Ranking, Siamese Encoders and Back Translation	Feb 25, 2021	Multiple-choiceQuestion Answering	—Unverified
IIE-NLP-NUT at SemEval-2020 Task 4: Guiding PLM with Prompt Template Reconstruction Strategy for ComVE	Jul 2, 2020	Multiple-choiceQuestion Answering	—Unverified
IllusionBench: A Large-scale and Comprehensive Benchmark for Visual Illusion Understanding in Vision-Language Models	Jan 1, 2025	HallucinationMultiple-choice	—Unverified
Image Aesthetic Reasoning: A New Benchmark for Medical Image Screening with MLLMs	May 29, 2025	Image GenerationMultiple-choice	—Unverified
Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation	May 23, 2024	Conversational RecommendationMultiple-choice	—Unverified
Improved Few-Shot Image Classification Through Multiple-Choice Questions	Jul 23, 2024	ArticlesFew-Shot Image Classification	—Unverified
Improvement/Extension of Modular Systems as Combinatorial Reengineering (Survey)	Apr 17, 2013	Combinatorial OptimizationMultiple-choice	—Unverified
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank	Apr 19, 2024	Distractor GenerationMath	—Unverified
Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack	May 21, 2025	Multiple-choiceMultiple Choice Question Answering (MCQA)	—Unverified
Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets	Sep 29, 2021	Language ModellingMachine Reading Comprehension	—Unverified
Improving the Production Efficiency and Well-formedness of Automatically-Generated Multiple-Choice Cloze Vocabulary Questions	May 1, 2020	Multiple-choice	—Unverified
In Case You Missed It: ARC 'Challenge' Is Not That Challenging	Dec 23, 2024	ARCMultiple-choice	—Unverified
TVBench: Redesigning Video-Language Evaluation	Oct 10, 2024	Multiple-choiceOpen-Ended Question Answering	—Unverified
Indirect Identification of Psychosocial Risks from Natural Language	Apr 30, 2020	Multiple-choiceTopic Models	—Unverified
Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection	Jan 28, 2025	Multiple-choice	—Unverified
Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions	Oct 19, 2022	Language ModelingLanguage Modelling	—Unverified
InnerThoughts: Disentangling Representations and Predictions in Large Language Models	Jan 29, 2025	Multiple-choicePosition	—Unverified
InstructionBench: An Instructional Video Understanding Benchmark	Apr 7, 2025	Common Sense ReasoningMultiple-choice	—Unverified
Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs	Jun 13, 2025	Medical Question AnsweringMedQA	—Unverified
Instruction Tuning on Public Government and Cultural Data for Low-Resource Language: a Case Study in Kazakh	Feb 19, 2025	Instruction FollowingMultiple-choice	—Unverified
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages	Dec 1, 2024	ARCMultiple-choice	—Unverified
Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering	Aug 6, 2020	Multiple-choiceQuestion Answering	—Unverified
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation	Jun 8, 2024	Abstractive Text SummarizationDialogue Generation	—Unverified
Investigating Data Contamination in Modern Benchmarks for Large Language Models	Nov 16, 2023	Common Sense ReasoningMMLU	—Unverified
Self-Assessment Tests are Unreliable Measures of LLM Personality	Sep 15, 2023	Multiple-choice	—Unverified
Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination	Jun 10, 2023	MathMathematical Reasoning	—Unverified
Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting	Oct 18, 2023	Multiple-choice	—Unverified
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts	Jun 18, 2025	document understandingMultiple-choice	—Unverified
ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention	Oct 1, 2020	Multiple-choiceQuestion Answering	—Unverified
ISAAQ - Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention	Nov 1, 2020	Multiple-choiceQuestion Answering	—Unverified
Is This Collection Worth My LLM's Time? Automatically Measuring Information Potential in Text Corpora	Feb 19, 2025	ArticlesMultiple-choice	—Unverified
An Algorithm for Generating Gap-Fill Multiple Choice Questions of an Expert System	Sep 17, 2021	Multiple-choicesoftware testing	—Unverified
It is Too Many Options: Pitfalls of Multiple-Choice Questions in Generative AI and Medical Education	Mar 13, 2025	Multiple-choice	—Unverified
Winning Amazon KDD Cup'24	Aug 5, 2024	Data AugmentationMultiple-choice	—Unverified
KMMLU: Measuring Massive Multitask Language Understanding in Korean	Feb 18, 2024	kmmluLanguage Model Evaluation	—Unverified
Knowledge-Driven Distractor Generation for Cloze-style Multiple Choice Questions	Apr 21, 2020	Distractor GenerationLearning-To-Rank	—Unverified
Knowledge Questions from Knowledge Graphs	Oct 31, 2016	Knowledge GraphsMultiple-choice	—Unverified
Knowledge Retrieval Based on Generative AI	Jan 8, 2025	Large Language ModelMultiple-choice	—Unverified

Show:10 25 50

← PrevPage 22 of 23Next →

No leaderboard results yet.