| KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting | Dec 1, 2024 | Multiple-choiceMultiple Choice Question Answering (MCQA) | CodeCode Available | 0 |
| Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments | Nov 30, 2024 | Multiple-choice | —Unverified | 0 |
| Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark | Nov 29, 2024 | BenchmarkingGrounded Video Question Answering | —Unverified | 0 |
| Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments | Nov 28, 2024 | Multiple-choice | —Unverified | 0 |
| Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers | Nov 28, 2024 | Image Captioningimage-classification | —Unverified | 0 |
| Multiple Choice Learning for Efficient Speech Separation with Many Speakers | Nov 27, 2024 | Multiple-choiceSpeech Separation | —Unverified | 0 |
| CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models | Nov 27, 2024 | BenchmarkingEarth Observation | CodeCode Available | 1 |
| NEMO: Can Multimodal LLMs Identify Attribute-Modified Objects? | Nov 26, 2024 | AttributeMultiple-choice | —Unverified | 0 |
| GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis | Nov 25, 2024 | Medical Visual Question AnsweringMultiple-choice | —Unverified | 0 |
| All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages | Nov 25, 2024 | AllLong Question Answer | CodeCode Available | 1 |