| AGReE: A system for generating Automated Grammar Reading Exercises | Oct 28, 2022 | ArticlesMultiple-choice | —Unverified | 0 |
| Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data | Jul 20, 2024 | Language ModellingMachine Translation | —Unverified | 0 |
| ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention | Oct 1, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Generating Adequate Distractors for Multiple-Choice Questions | Oct 23, 2020 | FormMultiple-choice | —Unverified | 0 |
| End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering | Oct 10, 2016 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Generating Diagnostic Multiple Choice Comprehension Cloze Questions | Jun 1, 2012 | DiagnosticMultiple-choice | —Unverified | 0 |
| Empowering Large Language Models in Wireless Communication: A Novel Dataset and Fine-Tuning Framework | Jan 16, 2025 | Multiple-choiceQuestion Generation | —Unverified | 0 |
| Generating multiple-choice questions for medical question answering with distractors and cue-masking | Mar 13, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions | Oct 24, 2020 | General ClassificationMultiple-choice | —Unverified | 0 |
| Generating Questions and Multiple-Choice Answers using Semantic Analysis of Texts | Dec 1, 2016 | coreference-resolutionCoreference Resolution | —Unverified | 0 |
| LLMs May Perform MCQA by Selecting the Least Incorrect Option | Feb 2, 2024 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 |
| Genome-Bench: A Scientific Reasoning Benchmark from Real-World Expert Discussions | May 26, 2025 | Multiple-choice | —Unverified | 0 |
| ELiRF-UPV at SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge | Jun 1, 2018 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning | Oct 21, 2019 | Data AugmentationDecision Making | —Unverified | 0 |
| Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark | Mar 22, 2025 | Multiple-choice | —Unverified | 0 |
| Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions | Jul 21, 2024 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 |
| GPT-4o System Card | Oct 25, 2024 | Multiple-choiceSpatial Reasoning | —Unverified | 0 |
| GPT-4 to GPT-3.5: 'Hold My Scalpel' -- A Look at the Competency of OpenAI's GPT on the Plastic Surgery In-Service Training Exam | Apr 4, 2023 | Multiple-choice | —Unverified | 0 |
| Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering | Aug 6, 2020 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions | Feb 26, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models | Mar 20, 2025 | Code GenerationMultiple-choice | —Unverified | 0 |
| GRAF: Graph Retrieval Augmented by Facts for Romanian Legal Multi-Choice Question Answering | Dec 5, 2024 | Information RetrievalMultiple-choice | —Unverified | 0 |
| GraphITE: Estimating Individual Effects of Graph-structured Treatments | Sep 29, 2020 | counterfactualDecision Making | —Unverified | 0 |
| Graph-Structured Representations for Visual Question Answering | Sep 19, 2016 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments | Nov 30, 2024 | Multiple-choice | —Unverified | 0 |
| Is There No Such Thing as a Bad Question? H4R: HalluciBot For Ratiocination, Rewriting, Ranking, and Routing | Apr 18, 2024 | HallucinationMultiple-choice | —Unverified | 0 |
| Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation | Jun 2, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| HANS, are you clever? Clever Hans Effect Analysis of Neural Systems | Sep 21, 2023 | Decision MakingMultiple-choice | —Unverified | 0 |
| A Graph-Guided Reasoning Approach for Open-ended Commonsense Question Answering | Mar 18, 2023 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Eliciting Categorical Data for Optimal Aggregation | Dec 1, 2016 | Multiple-choice | —Unverified | 0 |
| Eigen Values Features for the Classification of Brain Signals corresponding to 2D and 3D Educational Contents | Apr 30, 2019 | General ClassificationMultiple-choice | —Unverified | 0 |
| Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing | Oct 14, 2024 | AllBinary Classification | —Unverified | 0 |
| HATS: Hindi Analogy Test Set for Evaluating Reasoning in Large Language Models | Jul 17, 2025 | Multiple-choice | —Unverified | 0 |
| Have Large Language Models Developed a Personality?: Applicability of Self-Assessment Tests in Measuring Personality in LLMs | May 24, 2023 | Multiple-choice | —Unverified | 0 |
| Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation | Jun 8, 2024 | Abstractive Text SummarizationDialogue Generation | —Unverified | 0 |
| Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights | Sep 19, 2024 | Decision MakingKnowledge Distillation | —Unverified | 0 |
| Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs | Feb 18, 2025 | Generative Question AnsweringMultiple-choice | —Unverified | 0 |
| HFL-RC System at SemEval-2018 Task 11: Hybrid Multi-Aspects Model for Commonsense Reading Comprehension | Mar 15, 2018 | Multiple-choiceReading Comprehension | —Unverified | 0 |
| Hierarchical Divide-and-Conquer for Fine-Grained Alignment in LLM-Based Medical Evaluation | Jan 12, 2025 | AttributeMultiple-choice | —Unverified | 0 |
| HindiLLM: Large Language Model for Hindi | Dec 29, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models | Feb 21, 2024 | Multiple-choice | —Unverified | 0 |
| A Novel Approach for Constrained Optimization in Graphical Models | Dec 1, 2020 | Multiple-choice | —Unverified | 0 |
| AgMMU: A Comprehensive Agricultural Multimodal Understanding and Reasoning Benchmark | Apr 14, 2025 | ManagementMultiple-choice | —Unverified | 0 |
| How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering? | Jun 19, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Edinburgh Clinical NLP at MEDIQA-CORR 2024: Guiding Large Language Models with Hints | May 28, 2024 | Multiple-choiceSentence | —Unverified | 0 |
| Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III | Jun 29, 2025 | Model SelectionMultiple-choice | —Unverified | 0 |
| Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization | May 30, 2025 | FormLanguage Modeling | —Unverified | 0 |
| How well do LLMs reason over tabular data, really? | May 12, 2025 | Missing ValuesMultiple-choice | —Unverified | 0 |
| E-Commerce Promotions Personalization via Online Multiple-Choice Knapsack with Uplift Modeling | Aug 11, 2021 | Multiple-choice | —Unverified | 0 |
| Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare | Oct 24, 2024 | Multiple-choice | —Unverified | 0 |