| Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models | Dec 2, 2024 | MMLUMultiple-choice | CodeCode Available | 0 |
| Spoken Language Intelligence of Large Language Models for Language Learning | Aug 28, 2023 | Language AcquisitionMultiple-choice | CodeCode Available | 0 |
| ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision Assistant | May 6, 2025 | DescriptiveMultiple-choice | CodeCode Available | 0 |
| LLMs Are Not Intelligent Thinkers: Introducing Mathematical Topic Tree Benchmark for Comprehensive Evaluation of LLMs | Jun 7, 2024 | Mathematical ReasoningMultiple-choice | CodeCode Available | 0 |
| Balancing Rigor and Utility: Mitigating Cognitive Biases in Large Language Models for Multiple-Choice Questions | Jun 16, 2024 | Decision MakingLanguage Modelling | CodeCode Available | 0 |
| What Makes Reading Comprehension Questions Difficult? | Mar 12, 2022 | Logical ReasoningMultiple-choice | CodeCode Available | 0 |
| Wait, that's not an option: LLMs Robustness with Incorrect Multiple-Choice Options | Aug 27, 2024 | Decision MakingMultiple-choice | CodeCode Available | 0 |
| COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes | Sep 6, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| An Information-Theoretic Approach to Analyze NLP Classification Tasks | Feb 1, 2024 | Multiple-choiceReading Comprehension | CodeCode Available | 0 |
| World Knowledge in Multiple Choice Reading Comprehension | Nov 13, 2022 | General KnowledgeMultiple-choice | CodeCode Available | 0 |