| EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models | Mar 15, 2024 | MiscellaneousMultiple-choice | CodeCode Available | 0 |
| SNS-Bench-VL: Benchmarking Multimodal Large Language Models in Social Networking Services | May 29, 2025 | BenchmarkingInformation Retrieval | CodeCode Available | 0 |
| BERT-based distractor generation for Swedish reading comprehension questions using a small-scale dataset | Aug 9, 2021 | Distractor GenerationMultiple-choice | CodeCode Available | 0 |
| Quantitative Assessment of Intersectional Empathetic Bias and Understanding | Nov 8, 2024 | Multiple-choice | CodeCode Available | 0 |
| Explanatory Argument Extraction of Correct Answers in Resident Medical Exams | Dec 1, 2023 | Multiple-choice | CodeCode Available | 0 |
| Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data | Jun 4, 2024 | Clinical KnowledgeMultiple-choice | CodeCode Available | 0 |
| Cascading Biases: Investigating the Effect of Heuristic Annotation Strategies on Data and Models | Oct 24, 2022 | Multiple-choiceReading Comprehension | CodeCode Available | 0 |
| Automated Distractor and Feedback Generation for Math Multiple-choice Questions via In-context Learning | Aug 7, 2023 | In-Context LearningMath | CodeCode Available | 0 |
| Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models | Apr 2, 2024 | Distractor GenerationIn-Context Learning | CodeCode Available | 0 |
| Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models | Sep 19, 2023 | Explanation GenerationLanguage Modelling | CodeCode Available | 0 |