| A Weak Supervision Approach for Predicting Difficulty of Technical Interview Questions | Oct 1, 2022 | Multiple-choicePrediction | —Unverified | 0 | 0 |
| Bayesian Statistical Modeling with Predictors from LLMs | Jun 13, 2024 | Multiple-choice | —Unverified | 0 | 0 |
| Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets | Apr 24, 2017 | Multiple-choiceQuestion Answering | —Unverified | 0 | 0 |
| Benchmarking Bias in Large Language Models during Role-Playing | Nov 1, 2024 | BenchmarkingFairness | —Unverified | 0 | 0 |
| The Future of Learning in the Age of Generative AI: Automated Question Generation and Assessment with Large Language Models | Oct 12, 2024 | MisconceptionsMultiple-choice | —Unverified | 0 | 0 |
| Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions | Jul 21, 2024 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 | 0 |
| The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations | Jul 17, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Benchmarking Next-Generation Reasoning-Focused Large Language Models in Ophthalmology: A Head-to-Head Evaluation on 5,888 Items | Apr 15, 2025 | BenchmarkingMultiple-choice | —Unverified | 0 | 0 |
| Benchmarks for Pirá 2.0, a Reading Comprehension Dataset about the Ocean, the Brazilian Coast, and Climate Change | Sep 19, 2023 | Generative Question AnsweringInformation Retrieval | —Unverified | 0 | 0 |
| Better Distractions: Transformer-based Distractor Generation and Multiple Choice Question Filtering | Oct 19, 2020 | Distractor GenerationLanguage Modeling | —Unverified | 0 | 0 |