| CodeReviewQA: The Code Review Comprehension Assessment for Large Language Models | Mar 20, 2025 | Code GenerationMultiple-choice | —Unverified | 0 | 0 |
| COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain | May 17, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments | Nov 30, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Collaboration among Multiple Large Language Models for Medical Question Answering | May 22, 2025 | Medical Question AnsweringMultiple-choice | —Unverified | 0 | 0 |
| Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses | Jun 15, 2023 | Multiple-choice | —Unverified | 0 | 0 |
| Combinatorial framework for planning in geological exploration | Jan 22, 2018 | AttributeMultiple-choice | —Unverified | 0 | 0 |
| Combining Multiple Cues for Visual Madlibs Question Answering | Nov 1, 2016 | AttributeGeneral Classification | —Unverified | 0 | 0 |
| Comparative Study of Learning Outcomes for Online Learning Platforms | Apr 15, 2021 | Active LearningMultiple-choice | —Unverified | 0 | 0 |
| Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding | Jun 17, 2025 | Multiple-choiceNatural Language Inference | —Unverified | 0 | 0 |
| Confidence-Aware Learning Assistant | Feb 15, 2021 | Multiple-choice | —Unverified | 0 | 0 |