| Identity Lock: Locking API Fine-tuned LLMs With Identity-based Wake Words | Mar 10, 2025 | Multiple-choice | —Unverified | 0 |
| Genome-Bench: A Scientific Reasoning Benchmark from Real-World Expert Discussions | May 26, 2025 | Multiple-choice | —Unverified | 0 |
| ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions | Feb 26, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| A Graph-Guided Reasoning Approach for Open-ended Commonsense Question Answering | Mar 18, 2023 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark | Mar 22, 2025 | Multiple-choice | —Unverified | 0 |
| Eliciting Categorical Data for Optimal Aggregation | Dec 1, 2016 | Multiple-choice | —Unverified | 0 |
| GPT-4o System Card | Oct 25, 2024 | Multiple-choiceSpatial Reasoning | —Unverified | 0 |
| GPT-4 to GPT-3.5: 'Hold My Scalpel' -- A Look at the Competency of OpenAI's GPT on the Plastic Surgery In-Service Training Exam | Apr 4, 2023 | Multiple-choice | —Unverified | 0 |
| Eigen Values Features for the Classification of Brain Signals corresponding to 2D and 3D Educational Contents | Apr 30, 2019 | General ClassificationMultiple-choice | —Unverified | 0 |
| Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing | Oct 14, 2024 | AllBinary Classification | —Unverified | 0 |