Multiple-choice

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 651–700 of 1107 papers

Title	Date	Tasks	Status
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?	Jun 6, 2024	Multiple-choiceQuestion Answering	—Unverified
Analyzing the Performance of ChatGPT in Cardiology and Vascular Pathologies	Apr 15, 2023	Language ModelingLanguage Modelling	—Unverified
Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information	May 9, 2025	BenchmarkingForm	—Unverified
HFL-RC System at SemEval-2018 Task 11: Hybrid Multi-Aspects Model for Commonsense Reading Comprehension	Mar 15, 2018	Multiple-choiceReading Comprehension	—Unverified
Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation	Jan 12, 2025	AttributeMultiple-choice	—Unverified
HindiLLM: Large Language Model for Hindi	Dec 29, 2024	Language ModelingLanguage Modelling	—Unverified
Analyzing Multiple-Choice Reading and Listening Comprehension Tests	Jul 3, 2023	Multiple-choiceReading Comprehension	—Unverified
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites	Apr 25, 2024	4kLanguage Modeling	—Unverified
How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering?	Jun 19, 2025	Multiple-choiceQuestion Answering	—Unverified
How Many Workers to Ask? Adaptive Exploration for Collecting High Quality Labels	Nov 1, 2014	Multiple-choice	—Unverified
How Susceptible are LLMs to Influence in Prompts?	Aug 17, 2024	Multiple-choiceQuestion Answering	—Unverified
How well do LLMs reason over tabular data, really?	May 12, 2025	Missing ValuesMultiple-choice	—Unverified
HRCA+: Advanced Multiple-choice Machine Reading Comprehension Method	Jun 1, 2022	Machine Reading ComprehensionMultiple-choice	—Unverified
Humanity's Last Exam	Jan 24, 2025	Humanity's Last ExamLanguage Modeling	—Unverified
Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators	Nov 8, 2024	Decision MakingMultiple-choice	—Unverified
Hypothesis Testing for Quantifying LLM-Human Misalignment in Multiple Choice Settings	Jun 17, 2025	Decision MakingLanguage Modeling	—Unverified
Identification of mental fatigue in language comprehension tasks based on EEG and deep learning	Apr 14, 2021	ClassificationEEG	—Unverified
Treatment Effects with Multidimensional Unobserved Heterogeneity: Identification of the Marginal Treatment Effect	Sep 23, 2022	Multiple-choice	—Unverified
Identifying Multiple Personalities in Large Language Models with External Evaluation	Feb 22, 2024	Multiple-choice	—Unverified
Identity Lock: Locking API Fine-tuned LLMs With Identity-based Wake Words	Mar 10, 2025	Multiple-choice	—Unverified
IIE-NLP-Eyas at SemEval-2021 Task 4: Enhancing PLM for ReCAM with Special Tokens, Re-Ranking, Siamese Encoders and Back Translation	Feb 25, 2021	Multiple-choiceQuestion Answering	—Unverified
IIE-NLP-NUT at SemEval-2020 Task 4: Guiding PLM with Prompt Template Reconstruction Strategy for ComVE	Jul 2, 2020	Multiple-choiceQuestion Answering	—Unverified
IllusionBench: A Large-scale and Comprehensive Benchmark for Visual Illusion Understanding in Vision-Language Models	Jan 1, 2025	HallucinationMultiple-choice	—Unverified
Image Aesthetic Reasoning: A New Benchmark for Medical Image Screening with MLLMs	May 29, 2025	Image GenerationMultiple-choice	—Unverified
Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation	May 23, 2024	Conversational RecommendationMultiple-choice	—Unverified
Improved Few-Shot Image Classification Through Multiple-Choice Questions	Jul 23, 2024	ArticlesFew-Shot Image Classification	—Unverified
Improvement/Extension of Modular Systems as Combinatorial Reengineering (Survey)	Apr 17, 2013	Combinatorial OptimizationMultiple-choice	—Unverified
Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank	Apr 19, 2024	Distractor GenerationMath	—Unverified
Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack	May 21, 2025	Multiple-choiceMultiple Choice Question Answering (MCQA)	—Unverified
Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets	Sep 29, 2021	Language ModellingMachine Reading Comprehension	—Unverified
Improving the Production Efficiency and Well-formedness of Automatically-Generated Multiple-Choice Cloze Vocabulary Questions	May 1, 2020	Multiple-choice	—Unverified
In Case You Missed It: ARC 'Challenge' Is Not That Challenging	Dec 23, 2024	ARCMultiple-choice	—Unverified
TVBench: Redesigning Video-Language Evaluation	Oct 10, 2024	Multiple-choiceOpen-Ended Question Answering	—Unverified
Indirect Identification of Psychosocial Risks from Natural Language	Apr 30, 2020	Multiple-choiceTopic Models	—Unverified
Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection	Jan 28, 2025	Multiple-choice	—Unverified
Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions	Oct 19, 2022	Language ModelingLanguage Modelling	—Unverified
InnerThoughts: Disentangling Representations and Predictions in Large Language Models	Jan 29, 2025	Multiple-choicePosition	—Unverified
InstructionBench: An Instructional Video Understanding Benchmark	Apr 7, 2025	Common Sense ReasoningMultiple-choice	—Unverified
Instruction Tuning and CoT Prompting for Contextual Medical QA with LLMs	Jun 13, 2025	Medical Question AnsweringMedQA	—Unverified
Instruction Tuning on Public Government and Cultural Data for Low-Resource Language: a Case Study in Kazakh	Feb 19, 2025	Instruction FollowingMultiple-choice	—Unverified
Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages	Dec 1, 2024	ARCMultiple-choice	—Unverified
Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering	Aug 6, 2020	Multiple-choiceQuestion Answering	—Unverified
Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation	Jun 8, 2024	Abstractive Text SummarizationDialogue Generation	—Unverified
Investigating Data Contamination in Modern Benchmarks for Large Language Models	Nov 16, 2023	Common Sense ReasoningMMLU	—Unverified
Self-Assessment Tests are Unreliable Measures of LLM Personality	Sep 15, 2023	Multiple-choice	—Unverified
Investigating the Effectiveness of ChatGPT in Mathematical Reasoning and Problem Solving: Evidence from the Vietnamese National High School Graduation Examination	Jun 10, 2023	MathMathematical Reasoning	—Unverified
Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting	Oct 18, 2023	Multiple-choice	—Unverified
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts	Jun 18, 2025	document understandingMultiple-choice	—Unverified
ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention	Oct 1, 2020	Multiple-choiceQuestion Answering	—Unverified
ISAAQ - Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention	Nov 1, 2020	Multiple-choiceQuestion Answering	—Unverified

Show:10 25 50

← PrevPage 14 of 23Next →

No leaderboard results yet.