| Stick to your Role! Stability of Personal Values Expressed in Large Language Models | Feb 19, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles | Jun 24, 2016 | Multiple-choice | —Unverified | 0 | 0 |
| Adapting Vision-Language Models for Evaluating World Models | Jun 22, 2025 | Action RecognitionMultimodal Reasoning | —Unverified | 0 | 0 |
| Strategyproof Mean Estimation from Multiple-Choice Questions | Jan 1, 2020 | Multiple-choice | —Unverified | 0 | 0 |
| Structured Outputs Enable General-Purpose LLMs to be Medical Experts | Mar 5, 2025 | Clinical KnowledgeMedical Question Answering | —Unverified | 0 | 0 |
| What does BERT Learn from Multiple-Choice Reading Comprehension Datasets? | Oct 28, 2019 | Multiple-choiceReading Comprehension | —Unverified | 0 | 0 |
| Superhuman performance of a large language model on the reasoning tasks of a physician | Dec 14, 2024 | DiagnosticLanguage Modeling | —Unverified | 0 | 0 |
| What do we expect from Multiple-choice QA Systems? | Nov 20, 2020 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 | 0 |
| What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets | Jul 7, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the U.S | Oct 21, 2024 | Multiple-choice | —Unverified | 0 | 0 |