| Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack | May 21, 2025 | Multiple-choiceMultiple Choice Question Answering (MCQA) | —Unverified | 0 | 0 |
| Analysing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets | Sep 29, 2021 | Language ModellingMachine Reading Comprehension | —Unverified | 0 | 0 |
| Improving the Production Efficiency and Well-formedness of Automatically-Generated Multiple-Choice Cloze Vocabulary Questions | May 1, 2020 | Multiple-choice | —Unverified | 0 | 0 |
| In Case You Missed It: ARC 'Challenge' Is Not That Challenging | Dec 23, 2024 | ARCMultiple-choice | —Unverified | 0 | 0 |
| TVBench: Redesigning Video-Language Evaluation | Oct 10, 2024 | Multiple-choiceOpen-Ended Question Answering | —Unverified | 0 | 0 |
| Indirect Identification of Psychosocial Risks from Natural Language | Apr 30, 2020 | Multiple-choiceTopic Models | —Unverified | 0 | 0 |
| Inferring from Logits: Exploring Best Practices for Decoding-Free Generative Candidate Selection | Jan 28, 2025 | Multiple-choice | —Unverified | 0 | 0 |
| Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions | Oct 19, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| InnerThoughts: Disentangling Representations and Predictions in Large Language Models | Jan 29, 2025 | Multiple-choicePosition | —Unverified | 0 | 0 |
| InstructionBench: An Instructional Video Understanding Benchmark | Apr 7, 2025 | Common Sense ReasoningMultiple-choice | —Unverified | 0 | 0 |