| A Weak Supervision Approach for Predicting Difficulty of Technical Interview Questions | Oct 1, 2022 | Multiple-choicePrediction | —Unverified | 0 | 0 |
| Bayesian Statistical Modeling with Predictors from LLMs | Jun 13, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets | Apr 24, 2017 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| Benchmarking Bias in Large Language Models during Role-Playing | Nov 1, 2024 | BenchmarkingFairness | —Unverified | 0 | 0 |
| The Future of Learning in the Age of Generative AI: Automated Question Generation and Assessment with Large Language Models | Oct 12, 2024 | MisconceptionsMultiple-choice | —Unverified | 0 | 0 |
| Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions | Jul 21, 2024 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 | 0 |
| The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations | Jul 17, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Benchmarking Next-Generation Reasoning-Focused Large Language Models in Ophthalmology: A Head-to-Head Evaluation on 5,888 Items | Apr 15, 2025 | BenchmarkingMultiple-choice | —Unverified | 0 | 0 |
| Benchmarks for Pirá 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change | Sep 19, 2023 | Generative Question AnsweringInformation Retrieval | —Unverified | 0 | 0 |
| Better Distractions: Transformer-based Distractor Generation and Multiple Choice Question Filtering | Oct 19, 2020 | Distractor GenerationLanguage Modeling | —Unverified | 0 | 0 |
| Beyond Multiple-Choice Accuracy: Real-World Challenges of Implementing Large Language Models in Healthcare | Oct 24, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Beyond Multiple Choice: Evaluating Steering Vectors for Adaptive Free-Form Summarization | May 30, 2025 | FormLanguage Modeling | —Unverified | 0 | 0 |
| Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models | Feb 21, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Beyond Profile: From Surface-Level Facts to Deep Persona Simulation in LLMs | Feb 18, 2025 | Generative Question AnsweringMultiple-choice | —Unverified | 0 | 0 |
| Not All Options Are Created Equal: Textual Option Weighting for Token-Efficient LLM-Based Knowledge Tracing | Oct 14, 2024 | AllBinary Classification | —Unverified | 0 | 0 |
| The impact of AI and peer feedback on research writing skills: a study using the CGScholar platform among Kazakhstani scholars | Mar 5, 2025 | Multiple-choiceSurvey | —Unverified | 0 | 0 |
| LLMs May Perform MCQA by Selecting the Least Incorrect Option | Feb 2, 2024 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 | 0 |
| Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions | Oct 24, 2020 | General ClassificationMultiple-choice | —Unverified | 0 | 0 |
| ANPMI: Assessing the True Comprehension Capabilities of LLMs for Multiple Choice Questions | Feb 26, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Bilingual Evaluation of Language Models on General Knowledge in University Entrance Exams with Minimal Contamination | Sep 19, 2024 | General KnowledgeMMLU | —Unverified | 0 | 0 |
| The Impact of Item-Writing Flaws on Difficulty and Discrimination in Item Response Theory | Mar 13, 2025 | MathMultiple-choice | —Unverified | 0 | 0 |
| A Novel Approach for Constrained Optimization in Graphical Models | Dec 1, 2020 | Multiple-choice | —Unverified | 0 | 0 |
| BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles | Sep 23, 2021 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| The Lazy Student's Dream: ChatGPT Passing an Engineering Course on Its Own | Feb 23, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| BLINK: Multimodal Large Language Models Can See but Not Perceive | Apr 18, 2024 | Depth EstimationMultiple-choice | —Unverified | 0 | 0 |