| Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators | Nov 8, 2024 | Decision MakingMultiple-choice | —Unverified | 0 |
| ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning | Mar 31, 2025 | Multiple-choice | —Unverified | 0 |
| DsMCL: Dual-Level Stochastic Multiple Choice Learning for Multi-Modal Trajectory Prediction | Mar 19, 2020 | Multiple-choicePrediction | —Unverified | 0 |
| Identification of mental fatigue in language comprehension tasks based on EEG and deep learning | Apr 14, 2021 | ClassificationEEG | —Unverified | 0 |
| Treatment Effects with Multidimensional Unobserved Heterogeneity: Identification of the Marginal Treatment Effect | Sep 23, 2022 | Multiple-choice | —Unverified | 0 |
| Identifying Multiple Personalities in Large Language Models with External Evaluation | Feb 22, 2024 | Multiple-choice | —Unverified | 0 |
| Contextual Response Interpretation for Automated Structured Interviews: A Case Study in Market Research | Apr 30, 2023 | MarketingMultiple-choice | —Unverified | 0 |
| Identity Lock: Locking API Fine-tuned LLMs With Identity-based Wake Words | Mar 10, 2025 | Multiple-choice | —Unverified | 0 |
| IIE-NLP-Eyas at SemEval-2021 Task 4: Enhancing PLM for ReCAM with Special Tokens, Re-Ranking, Siamese Encoders and Back Translation | Feb 25, 2021 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| IIE-NLP-NUT at SemEval-2020 Task 4: Guiding PLM with Prompt Template Reconstruction Strategy for ComVE | Jul 2, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests | Jan 8, 2025 | Multimodal ReasoningMultiple-choice | —Unverified | 0 |
| AGenT Zero: Zero-shot Automatic Multiple-Choice Question Generation for Skill Assessments | Nov 25, 2020 | Multiple-choiceQuestion Generation | —Unverified | 0 |
| DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension | Mar 1, 2019 | Dialogue UnderstandingMultiple-choice | —Unverified | 0 |
| AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset | Nov 23, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| DP-SSL: Towards Robust Semi-supervised Learning with A Few Labeled Samples | Oct 26, 2021 | Multiple-choiceSemi-Supervised Image Classification | —Unverified | 0 |
| Do LLMs Recognize me, When I is not me: Assessment of LLMs Understanding of Turkish Indexical Pronouns in Indexical Shift Contexts | Jun 8, 2024 | Machine TranslationMultiple-choice | —Unverified | 0 |
| Benchmarks for Pirá 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change | Sep 19, 2023 | Generative Question AnsweringInformation Retrieval | —Unverified | 0 |
| Do LLMs Make Mistakes Like Students? Exploring Natural Alignment between Language Models and Human Error Patterns | Feb 21, 2025 | Distractor GenerationMultiple-choice | —Unverified | 0 |
| Do LLMs Know When to NOT Answer? Investigating Abstention Abilities of Large Language Models | Jul 23, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Benchmarking Next-Generation Reasoning-Focused Large Language Models in Ophthalmology: A Head-to-Head Evaluation on 5,888 Items | Apr 15, 2025 | BenchmarkingMultiple-choice | —Unverified | 0 |
| Do LLMs Act as Repositories of Causal Knowledge? | Dec 14, 2024 | Causal InferenceMultiple-choice | —Unverified | 0 |
| Do Large Language Models Know Folktales? A Case Study of Yokai in Japanese Folktales | Jun 4, 2025 | Multiple-choice | —Unverified | 0 |
| Do Fine-tuned Commonsense Language Models Really Generalize? | Nov 18, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| An MRC Framework for Semantic Role Labeling | Jan 16, 2022 | Computational EfficiencyMachine Reading Comprehension | —Unverified | 0 |
| Linguistic Legal Concept Extraction in Portuguese | Oct 22, 2018 | EthicsMultiple-choice | —Unverified | 0 |
| LMVE at SemEval-2020 Task 4: Commonsense Validation and Explanation using Pretraining Language Model | Jul 6, 2020 | Common Sense ReasoningLanguage Modeling | —Unverified | 0 |
| Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla | Jul 18, 2023 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Benchmarking Bias in Large Language Models during Role-Playing | Nov 1, 2024 | BenchmarkingFairness | —Unverified | 0 |
| Document-level Event Factuality Identification via Machine Reading Comprehension Frameworks with Transfer Learning | Oct 1, 2022 | Data AugmentationMachine Reading Comprehension | —Unverified | 0 |
| DMind Benchmark: Toward a Holistic Assessment of LLM Capabilities across the Web3 Domain | Apr 18, 2025 | Multiple-choice | —Unverified | 0 |
| A Corpus of Text Data and Gaze Fixations from Autistic and Non-Autistic Adults | May 1, 2016 | Multiple-choicePOS | —Unverified | 0 |
| Large Language Models Still Exhibit Bias in Long Text | Oct 23, 2024 | FairnessMultiple-choice | —Unverified | 0 |
| DiverseNet: When One Right Answer is not Enough | Aug 24, 2020 | Multiple-choiceStructured Prediction | —Unverified | 0 |
| Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets | Apr 24, 2017 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Learning a Word-Level Language Model with Sentence-Level Noise Contrastive Estimation for Contextual Sentence Probability Estimation | Mar 14, 2021 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Distributional semantics beyond words: Supervised learning of analogy and paraphrase | Oct 18, 2013 | Multiple-choiceTask 2 | —Unverified | 0 |
| Distractor Generation in Multiple-Choice Tasks: A Survey of Methods, Datasets, and Evaluation | Feb 2, 2024 | Distractor GenerationMultiple-choice | —Unverified | 0 |
| Bayesian Statistical Modeling with Predictors from LLMs | Jun 13, 2024 | Multiple-choice | —Unverified | 0 |
| A Weak Supervision Approach for Predicting Difficulty of Technical Interview Questions | Oct 1, 2022 | Multiple-choicePrediction | —Unverified | 0 |
| Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code | Mar 9, 2023 | Multiple-choice | —Unverified | 0 |
| Large Language Models Often Know When They Are Being Evaluated | May 28, 2025 | MMLUMultiple-choice | —Unverified | 0 |
| Distractor Analysis and Selection for Multiple-Choice Cloze Questions for Second-Language Learners | Jul 1, 2020 | Multiple-choice | —Unverified | 0 |
| DISTO: Evaluating Textual Distractors for Multi-Choice Questions using Negative Sampling based Approach | Apr 10, 2023 | Distractor GenerationMachine Translation | —Unverified | 0 |
| Auxiliary Class Based Multiple Choice Learning | Aug 6, 2021 | DiversityEnsemble Learning | —Unverified | 0 |
| Disaggregating Hops: Can We Guide a Multi-Hop Reasoning Language Model to Incrementally Learn at each Hop? | Jan 16, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| An Improved Traditional Chinese Evaluation Suite for Foundation Model | Mar 4, 2024 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| A Foundational Multimodal Vision Language AI Assistant for Human Pathology | Dec 13, 2023 | Decision MakingDiagnostic | —Unverified | 0 |
| Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions | Aug 22, 2023 | Multiple-choiceSensitivity | —Unverified | 0 |
| Learning Language-Visual Embedding for Movie Understanding with Natural-Language | Sep 26, 2016 | Multiple-choiceRetrieval | —Unverified | 0 |
| Digital Comprehensibility Assessment of Simplified Texts among Persons with Intellectual Disabilities | Feb 20, 2024 | Multiple-choiceText Simplification | —Unverified | 0 |