| Answer-level Calibration for Free-form Multiple Choice Question Answering | May 1, 2022 | FormLanguage Modeling | CodeCode Available | 0 |
| Sentence Embeddings for Russian NLU | Oct 29, 2019 | Multiple-choiceParaphrase Identification | CodeCode Available | 0 |
| Language Models as Knowledge Bases for Visual Word Sense Disambiguation | Oct 3, 2023 | Image CaptioningMultiple-choice | CodeCode Available | 0 |
| Multimodal Residual Learning for Visual QA | Jun 5, 2016 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| QASC: A Dataset for Question Answering via Sentence Composition | Oct 25, 2019 | Common Sense ReasoningMulti-hop Question Answering | CodeCode Available | 0 |
| VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation | Aug 15, 2017 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Simulating Training Data Leakage in Multiple-Choice Benchmarks for LLM Evaluation | May 30, 2025 | Continual PretrainingFairness | CodeCode Available | 0 |
| Every Answer Matters: Evaluating Commonsense with Probabilistic Measures | Jun 6, 2024 | Common Sense ReasoningLanguage Modeling | CodeCode Available | 0 |
| Evidence Sentence Extraction for Machine Reading Comprehension | Feb 23, 2019 | Machine Reading ComprehensionMultiple-choice | CodeCode Available | 0 |
| BertaQA: How Much Do Language Models Know About Local Culture? | Jun 11, 2024 | Multiple-choiceTransfer Learning | CodeCode Available | 0 |
| EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models | Mar 15, 2024 | MiscellaneousMultiple-choice | CodeCode Available | 0 |
| SNS-Bench-VL: Benchmarking Multimodal Large Language Models in Social Networking Services | May 29, 2025 | BenchmarkingInformation Retrieval | CodeCode Available | 0 |
| BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset | Aug 9, 2021 | Distractor GenerationMultiple-choice | CodeCode Available | 0 |
| Quantitative Assessment of Intersectional Empathetic Bias and Understanding | Nov 8, 2024 | Multiple-choice | CodeCode Available | 0 |
| Explanatory Argument Extraction of Correct Answers in Resident Medical Exams | Dec 1, 2023 | Multiple-choice | CodeCode Available | 0 |
| Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data | Jun 4, 2024 | Clinical KnowledgeMultiple-choice | CodeCode Available | 0 |
| Cascading Biases: Investigating the Effect of Heuristic Annotation Strategies on Data and Models | Oct 24, 2022 | Multiple-choiceReading Comprehension | CodeCode Available | 0 |
| Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning | Aug 7, 2023 | In-Context LearningMath | CodeCode Available | 0 |
| Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models | Apr 2, 2024 | Distractor GenerationIn-Context Learning | CodeCode Available | 0 |
| Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models | Sep 19, 2023 | Explanation GenerationLanguage Modelling | CodeCode Available | 0 |
| Question Answering as Global Reasoning over Semantic Abstractions | Jun 9, 2019 | Information RetrievalMultiple-choice | CodeCode Available | 0 |
| KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting | Dec 1, 2024 | Multiple-choiceMultiple Choice Question Answering (MCQA) | CodeCode Available | 0 |
| Multiple Hypothesis Dropout: Estimating the Parameters of Multi-Modal Output Distributions | Dec 18, 2023 | Multiple-choicePedestrian Trajectory Prediction | CodeCode Available | 0 |
| Question-Aware Knowledge Graph Prompting for Enhancing Large Language Models | Mar 30, 2025 | Knowledge GraphsMultiple-choice | CodeCode Available | 0 |
| An Automatic Question Usability Evaluation Toolkit | May 30, 2024 | Multiple-choiceWord Embeddings | CodeCode Available | 0 |