| Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators | Nov 8, 2024 | Decision MakingMultiple-choice | —Unverified | 0 |
| E-Commerce Promotions Personalization via Online Multiple-Choice Knapsack with Uplift Modeling | Aug 11, 2021 | Multiple-choice | —Unverified | 0 |
| Hypothesis Testing for Quantifying LLM-Human Misalignment in Multiple Choice Settings | Jun 17, 2025 | Decision MakingLanguage Modeling | —Unverified | 0 |
| Identification of mental fatigue in language comprehension tasks based on EEG and deep learning | Apr 14, 2021 | ClassificationEEG | —Unverified | 0 |
| Treatment Effects with Multidimensional Unobserved Heterogeneity: Identification of the Marginal Treatment Effect | Sep 23, 2022 | Multiple-choice | —Unverified | 0 |
| Identifying Multiple Personalities in Large Language Models with External Evaluation | Feb 22, 2024 | Multiple-choice | —Unverified | 0 |
| Contextual Response Interpretation for Automated Structured Interviews: A Case Study in Market Research | Apr 30, 2023 | MarketingMultiple-choice | —Unverified | 0 |
| Identity Lock: Locking API Fine-tuned LLMs With Identity-based Wake Words | Mar 10, 2025 | Multiple-choice | —Unverified | 0 |
| IIE-NLP-Eyas at SemEval-2021 Task 4: Enhancing PLM for ReCAM with Special Tokens, Re-Ranking, Siamese Encoders and Back Translation | Feb 25, 2021 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| IIE-NLP-NUT at SemEval-2020 Task 4: Guiding PLM with Prompt Template Reconstruction Strategy for ComVE | Jul 2, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare | Oct 24, 2024 | Multiple-choice | —Unverified | 0 |
| ACPBench: Reasoning about Action, Change, and Planning | Oct 8, 2024 | Multiple-choice | —Unverified | 0 |
| E-cheating Prevention Measures: Detection of Cheating at Online Examinations Using Deep Learning Approach -- A Case Study | Jan 25, 2021 | Multiple-choice | —Unverified | 0 |
| Better Distractions: Transformer-based Distractor Generation and Multiple Choice Question Filtering | Oct 19, 2020 | Distractor GenerationLanguage Modeling | —Unverified | 0 |
| KorMedMCQA: Multi-Choice Question Answering Benchmark for Korean Healthcare Professional Licensing Examinations | Mar 3, 2024 | MedQAMMLU | —Unverified | 0 |
| Dual Co-Matching Network for Multi-choice Reading Comprehension | Jan 27, 2019 | Machine Reading ComprehensionMultiple-choice | —Unverified | 0 |
| ACPBench Hard: Unrestrained Reasoning about Action, Change, and Planning | Mar 31, 2025 | Multiple-choice | —Unverified | 0 |
| KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge | Feb 21, 2024 | 4kMultiple-choice | —Unverified | 0 |
| DsMCL: Dual-Level Stochastic Multiple Choice Learning for Multi-Modal Trajectory Prediction | Mar 19, 2020 | Multiple-choicePrediction | —Unverified | 0 |
| DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests | Jan 8, 2025 | Multimodal ReasoningMultiple-choice | —Unverified | 0 |
| AGenT Zero: Zero-shot Automatic Multiple-Choice Question Generation for Skill Assessments | Nov 25, 2020 | Multiple-choiceQuestion Generation | —Unverified | 0 |
| DREAM: A Challenge Data Set and Models for Dialogue-Based Reading Comprehension | Mar 1, 2019 | Dialogue UnderstandingMultiple-choice | —Unverified | 0 |
| AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset | Nov 23, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| KoBALT: Korean Benchmark For Advanced Linguistic Tasks | May 22, 2025 | Multiple-choice | —Unverified | 0 |
| KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive Reasoning | May 14, 2025 | BenchmarkingMMLU | —Unverified | 0 |