| NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models | Oct 11, 2024 | Multiple-choiceTruthfulQA | CodeCode Available | 0 |
| Are Large Language Models Consistent over Value-laden Questions? | Jul 3, 2024 | Multiple-choice | CodeCode Available | 0 |
| Revisiting Visual Question Answering Baselines | Jun 27, 2016 | Binary ClassificationMultiple-choice | CodeCode Available | 0 |
| LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models | Oct 13, 2024 | HallucinationHallucination Evaluation | CodeCode Available | 0 |
| BUCA: A Binary Classification Approach to Unsupervised Commonsense Question Answering | May 25, 2023 | Binary ClassificationKnowledge Graphs | CodeCode Available | 0 |
| Automatic Generation and Evaluation of Reading Comprehension Test Items with Large Language Models | Apr 11, 2024 | Multiple-choiceReading Comprehension | CodeCode Available | 0 |
| Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding | Apr 20, 2025 | Autonomous DrivingImage Captioning | CodeCode Available | 0 |
| Abductive Commonsense Reasoning | Aug 15, 2019 | Multiple-choiceNatural Language Inference | CodeCode Available | 0 |
| A Multiple Choices Reading Comprehension Corpus for Vietnamese Language Education | Mar 31, 2023 | ArticlesMachine Reading Comprehension | CodeCode Available | 0 |
| When an LLM is apprehensive about its answers -- and when its uncertainty is justified | Mar 3, 2025 | MathMMLU | CodeCode Available | 0 |