| How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering? | Jun 19, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| How Many Workers to Ask? Adaptive Exploration for Collecting High Quality Labels | Nov 1, 2014 | Multiple-choice | —Unverified | 0 | 0 |
| How Susceptible are LLMs to Influence in Prompts? | Aug 17, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| How well do LLMs reason over tabular data, really? | May 12, 2025 | Missing ValuesMultiple-choice | —Unverified | 0 | 0 |
| HRCA+: Advanced Multiple-choice Machine Reading Comprehension Method | Jun 1, 2022 | Machine Reading ComprehensionMultiple-choice | —Unverified | 0 | 0 |
| Humanity's Last Exam | Jan 24, 2025 | Humanity's Last ExamLanguage Modeling | —Unverified | 0 | 0 |
| Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators | Nov 8, 2024 | Decision MakingMultiple-choice | —Unverified | 0 | 0 |
| Hypothesis Testing for Quantifying LLM-Human Misalignment in Multiple Choice Settings | Jun 17, 2025 | Decision MakingLanguage Modeling | —Unverified | 0 | 0 |
| Identification of mental fatigue in language comprehension tasks based on EEG and deep learning | Apr 14, 2021 | ClassificationEEG | —Unverified | 0 | 0 |
| Treatment Effects with Multidimensional Unobserved Heterogeneity: Identification of the Marginal Treatment Effect | Sep 23, 2022 | Multiple-choice | —Unverified | 0 | 0 |
| Identifying Multiple Personalities in Large Language Models with External Evaluation | Feb 22, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Identity Lock: Locking API Fine-tuned LLMs With Identity-based Wake Words | Mar 10, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| IIE-NLP-Eyas at SemEval-2021 Task 4: Enhancing PLM for ReCAM with Special Tokens, Re-Ranking, Siamese Encoders and Back Translation | Feb 25, 2021 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| IIE-NLP-NUT at SemEval-2020 Task 4: Guiding PLM with Prompt Template Reconstruction Strategy for ComVE | Jul 2, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| IllusionBench: A Large-scale and Comprehensive Benchmark for Visual Illusion Understanding in Vision-Language Models | Jan 1, 2025 | HallucinationMultiple-choice | —Unverified | 0 | 0 |
| Image Aesthetic Reasoning: A New Benchmark for Medical Image Screening with MLLMs | May 29, 2025 | Image GenerationMultiple-choice | —Unverified | 0 | 0 |
| Imagery as Inquiry: Exploring A Multimodal Dataset for Conversational Recommendation | May 23, 2024 | Conversational RecommendationMultiple-choice | —Unverified | 0 | 0 |
| Improved Few-Shot Image Classification Through Multiple-Choice Questions | Jul 23, 2024 | ArticlesFew-Shot Image Classification | —Unverified | 0 | 0 |
| Improvement/Extension of Modular Systems as Combinatorial Reengineering (Survey) | Apr 17, 2013 | Combinatorial OptimizationMultiple-choice | —Unverified | 0 | 0 |
| Improving Automated Distractor Generation for Math Multiple-choice Questions with Overgenerate-and-rank | Apr 19, 2024 | Distractor GenerationMath | —Unverified | 0 | 0 |
| Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack | May 21, 2025 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 | 0 |
| Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets | Sep 29, 2021 | Language ModellingMachine Reading Comprehension | —Unverified | 0 | 0 |
| Improving the Production Efficiency and Well-formedness of Automatically-Generated Multiple-Choice Cloze Vocabulary Questions | May 1, 2020 | Multiple-choice | —Unverified | 0 | 0 |
| In Case You Missed It: ARC 'Challenge' Is Not That Challenging | Dec 23, 2024 | ARCMultiple-choice | —Unverified | 0 | 0 |
| TVBench: Redesigning Video-Language Evaluation | Oct 10, 2024 | Multiple-choiceOpen-Ended Question Answering | —Unverified | 0 | 0 |