| Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering | Aug 28, 2018 | AI2 Reasoning ChallengeARC | CodeCode Available | 0 | 5 |
| BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset | Aug 9, 2021 | Distractor GenerationMultiple-choice | CodeCode Available | 0 | 5 |
| DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension | Feb 1, 2019 | Dialogue UnderstandingMultiple-choice | CodeCode Available | 0 | 5 |
| BertaQA: How Much Do Language Models Know About Local Culture? | Jun 11, 2024 | Multiple-choiceTransfer Learning | CodeCode Available | 0 | 5 |
| Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor | Dec 8, 2024 | MisconceptionsMultiple-choice | CodeCode Available | 0 | 5 |
| KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models | Oct 15, 2023 | Multiple-choiceTriplet | CodeCode Available | 0 | 5 |
| Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation | Apr 9, 2025 | Multiple-choice | CodeCode Available | 0 | 5 |
| Joint Learning of Sentence Embeddings for Relevance and Entailment | May 16, 2016 | Decision MakingInformation Retrieval | CodeCode Available | 0 | 5 |
| SCoRE: Benchmarking Long-Chain Reasoning in Commonsense Scenarios | Mar 8, 2025 | BenchmarkingDiagnostic | CodeCode Available | 0 | 5 |
| Iterative Forward Tuning Boosts In-Context Learning in Language Models | May 22, 2023 | Decision MakingIn-Context Learning | CodeCode Available | 0 | 5 |