| Weighted Global Normalization for Multiple Choice Reading Comprehension over Long Documents | Dec 5, 2018 | Answer SelectionMultiple-choice | —Unverified | 0 | 0 |
| Recent Advances in Multi-Choice Machine Reading Comprehension: A Survey on Methods and Datasets | Aug 4, 2024 | Few-Shot LearningMachine Reading Comprehension | —Unverified | 0 | 0 |
| Correctness Coverage Evaluation for Medical Multiple-Choice Question Answering Based on the Enhanced Conformal Prediction Framework | Mar 7, 2025 | Conformal PredictionMedical Question Answering | —Unverified | 0 | 0 |
| Statistically Profiling Biases in Natural Language Reasoning Datasets and Models | Feb 9, 2021 | Multiple-choiceNatural Language Understanding | —Unverified | 0 | 0 |
| Adaptive Crowdsourcing Algorithms for the Bandit Survey Problem | Feb 13, 2013 | Information RetrievalMultiple-choice | —Unverified | 0 | 0 |
| Stick to your Role! Stability of Personal Values Expressed in Large Language Models | Feb 19, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles | Jun 24, 2016 | Multiple-choice | —Unverified | 0 | 0 |
| Adapting Vision-Language Models for Evaluating World Models | Jun 22, 2025 | Action RecognitionMultimodal Reasoning | —Unverified | 0 | 0 |
| Strategyproof Mean Estimation from Multiple-Choice Questions | Jan 1, 2020 | Multiple-choice | —Unverified | 0 | 0 |
| Structured Outputs Enable General-Purpose LLMs to be Medical Experts | Mar 5, 2025 | Clinical KnowledgeMedical Question Answering | —Unverified | 0 | 0 |
| What does BERT Learn from Multiple-Choice Reading Comprehension Datasets? | Oct 28, 2019 | Multiple-choiceReading Comprehension | —Unverified | 0 | 0 |
| Superhuman performance of a large language model on the reasoning tasks of a physician | Dec 14, 2024 | DiagnosticLanguage Modeling | —Unverified | 0 | 0 |
| What do we expect from Multiple-choice QA Systems? | Nov 20, 2020 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 | 0 |
| What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets | Jul 7, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| Susu Box or Piggy Bank: Assessing Cultural Commonsense Knowledge between Ghana and the U.S | Oct 21, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference | Aug 16, 2018 | Common Sense ReasoningMultiple-choice | —Unverified | 0 | 0 |
| SynDARin: Synthesising Datasets for Automated Reasoning in Low-Resource Languages | Jun 20, 2024 | Language ModellingLarge Language Model | —Unverified | 0 | 0 |
| TabMCQ: A Dataset of General Knowledge Tables and Multiple-choice Questions | Feb 12, 2016 | General KnowledgeMultiple-choice | —Unverified | 0 | 0 |
| TA-MAMC at SemEval-2021 Task 4: Task-adaptive Pretraining and Multi-head Attention for Abstract Meaning Reading Comprehension | Aug 1, 2021 | Contrastive LearningMultiple-choice | —Unverified | 0 | 0 |
| Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling | Sep 30, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| TCM-Ladder: A Benchmark for Multimodal Question Answering on Traditional Chinese Medicine | May 29, 2025 | DiagnosticMultiple-choice | —Unverified | 0 | 0 |
| Tell Me Who Your Students Are: GPT Can Generate Valid Multiple-Choice Questions When Students' (Mis)Understanding Is Hinted | May 9, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Empowering Sentence Encoders with Prompting and Label Retrieval for Zero-shot Text Classification | Dec 20, 2022 | ClassificationDescriptive | —Unverified | 0 | 0 |
| Testing Uncertainty of Large Language Models for Physics Knowledge and Reasoning | Nov 18, 2024 | Logical ReasoningMultiple-choice | —Unverified | 0 | 0 |
| Answering Chinese Elementary School Social Studies Multiple Choice Questions | Dec 1, 2021 | Multiple-choice | —Unverified | 0 | 0 |