| GANDALF: a General Character Name Description Dataset for Long Fiction | Nov 1, 2021 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis | Nov 25, 2024 | Medical Visual Question AnsweringMultiple-choice | —Unverified | 0 | 0 |
| Generalised Winograd Schema and its Contextuality | Aug 31, 2023 | coreference-resolutionCoreference Resolution | —Unverified | 0 | 0 |
| Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data | Jul 20, 2024 | Language ModellingMachine Translation | —Unverified | 0 | 0 |
| Who did What: A Large-Scale Person-Centered Cloze Dataset | Aug 19, 2016 | ArticlesMultiple-choice | —Unverified | 0 | 0 |
| Generating Adequate Distractors for Multiple-Choice Questions | Oct 23, 2020 | FormMultiple-choice | —Unverified | 0 | 0 |
| Generating Correct Answers for Progressive Matrices Intelligence Tests | Nov 1, 2020 | Multiple-choice | —Unverified | 0 | 0 |
| Generating Diagnostic Multiple Choice Comprehension Cloze Questions | Jun 1, 2012 | DiagnosticMultiple-choice | —Unverified | 0 | 0 |
| Who's the Best Detective? LLMs vs. MLs in Detecting Incoherent Fourth Grade Math Answers | Apr 21, 2023 | MathMultiple-choice | —Unverified | 0 | 0 |
| Generating multiple-choice questions for medical question answering with distractors and cue-masking | Mar 13, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |