| Assessing AI-Generated Questions' Alignment with Cognitive Frameworks in Educational Assessment | Apr 19, 2025 | ClassificationMultiple-choice | —Unverified | 0 | 0 |
| An AI-based Solution for Enhancing Delivery of Digital Learning for Future Teachers | Nov 9, 2021 | Multiple-choiceQuestion Generation | —Unverified | 0 | 0 |
| Addressing Blind Guessing: Calibration of Selection Bias in Multiple-Choice Question Answering by Video Language Models | Oct 18, 2024 | FairnessMultiple-choice | —Unverified | 0 | 0 |
| HANS, are you clever? Clever Hans Effect Analysis of Neural Systems | Sep 21, 2023 | Decision MakingMultiple-choice | —Unverified | 0 | 0 |
| Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation | Jun 2, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| Collaboration among Multiple Large Language Models for Medical Question Answering | May 22, 2025 | Medical Question AnsweringMultiple-choice | —Unverified | 0 | 0 |
| Is There No Such Thing as a Bad Question? H4R: HalluciBot For Ratiocination, Rewriting, Ranking, and Routing | Apr 18, 2024 | HallucinationMultiple-choice | —Unverified | 0 | 0 |
| Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments | Nov 30, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Graph-Structured Representations for Visual Question Answering | Sep 19, 2016 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| GraphITE: Estimating Individual Effects of Graph-structured Treatments | Sep 29, 2020 | counterfactualDecision Making | —Unverified | 0 | 0 |
| COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain | May 17, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| GRAF: Graph Retrieval Augmented by Facts for Romanian Legal Multi-Choice Question Answering | Dec 5, 2024 | Information RetrievalMultiple-choice | —Unverified | 0 | 0 |
| CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models | Mar 20, 2025 | Code GenerationMultiple-choice | —Unverified | 0 | 0 |
| A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs | Jun 11, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| GPT-4 to GPT-3.5: 'Hold My Scalpel' -- A Look at the Competency of OpenAI's GPT on the Plastic Surgery In-Service Training Exam | Apr 4, 2023 | Multiple-choice | —Unverified | 0 | 0 |
| GPT-4o System Card | Oct 25, 2024 | Multiple-choiceSpatial Reasoning | —Unverified | 0 | 0 |
| CoddLLM: Empowering Large Language Models for Data Analytics | Feb 1, 2025 | Multiple-choiceSynthetic Data Generation | —Unverified | 0 | 0 |
| A Semantic Parsing Algorithm to Solve Linear Ordering Problems | Feb 12, 2025 | Multiple-choiceSemantic Parsing | —Unverified | 0 | 0 |
| Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark | Mar 22, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning | Oct 21, 2019 | Data AugmentationDecision Making | —Unverified | 0 | 0 |
| GeoCode-GPT: A Large Language Model for Geospatial Code Generation Tasks | Oct 22, 2024 | Code GenerationCode Summarization | —Unverified | 0 | 0 |
| A Semantic Feature-Wise Transformation Relation Network for Automatic Short Answer Grading | Nov 1, 2021 | automatic short answer gradingData Augmentation | —Unverified | 0 | 0 |
| An Add-On for Empowering Google Forms to be an Automatic Question Generator in Online Assessments | Sep 21, 2021 | Multiple-choice | —Unverified | 0 | 0 |
| Genome-Bench: A Scientific Reasoning Benchmark from Real-World Expert Discussions | May 26, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| GenNet : Reading Comprehension with Multiple Choice Questions using Generation and Selection model | Mar 3, 2020 | Answer GenerationMachine Reading Comprehension | —Unverified | 0 | 0 |