| HashEvict: A Pre-Attention KV Cache Eviction Strategy using Locality-Sensitive Hashing | Dec 13, 2024 | GPUMultiple-choice | —Unverified | 0 |
| A multimodal dataset for understanding the impact of mobile phones on remote online virtual education | Dec 13, 2024 | EEGHead Pose Estimation | CodeCode Available | 0 |
| LLM Distillation for Efficient Few-Shot Multiple Choice Question Answering | Dec 13, 2024 | Few-Shot LearningKnowledge Distillation | —Unverified | 0 |
| Does Multiple Choice Have a Future in the Age of Generative AI? A Posttest-only RCT | Dec 13, 2024 | Multiple-choice | CodeCode Available | 0 |
| Neptune: The Long Orbit to Benchmarking Long Video Understanding | Dec 12, 2024 | BenchmarkingMultimodal Reasoning | CodeCode Available | 2 |
| Filter-then-Generate: Large Language Models with Structure-Text Adapter for Knowledge Graph Completion | Dec 12, 2024 | HallucinationKnowledge Graph Completion | CodeCode Available | 1 |
| MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models | Dec 10, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings | Dec 9, 2024 | Multiple-choice | CodeCode Available | 0 |
| ACQ: A Unified Framework for Automated Programmatic Creativity in Online Advertising | Dec 9, 2024 | Multiple-choiceMulti-Task Learning | —Unverified | 0 |
| Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor | Dec 8, 2024 | MisconceptionsMultiple-choice | CodeCode Available | 0 |
| MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects | Dec 6, 2024 | 2kAnomaly Detection | —Unverified | 0 |
| Establishing Task Scaling Laws via Compute-Efficient Model Ladders | Dec 5, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| GRAF: Graph Retrieval Augmented by Facts for Romanian Legal Multi-Choice Question Answering | Dec 5, 2024 | Information RetrievalMultiple-choice | —Unverified | 0 |
| AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? | Dec 3, 2024 | Multiple-choice | CodeCode Available | 1 |
| SailCompass: Towards Reproducible and Robust Evaluation for Southeast Asian Languages | Dec 2, 2024 | Multiple-choice | CodeCode Available | 1 |
| Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models | Dec 2, 2024 | MMLUMultiple-choice | CodeCode Available | 0 |
| The use of large language models to enhance cancer clinical trial educational materials | Dec 2, 2024 | MisinformationMultiple-choice | —Unverified | 0 |
| Unlocking Video-LLM via Agent-of-Thoughts Distillation | Dec 2, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Uhura: A Benchmark for Evaluating Scientific Question Answering and Truthfulness in Low-Resource African Languages | Dec 1, 2024 | ARCMultiple-choice | —Unverified | 0 |
| VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information | Dec 1, 2024 | Multiple-choice | CodeCode Available | 1 |
| KnowledgePrompts: Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting | Dec 1, 2024 | Multiple-choiceMultiple Choice Question Answering (MCQA) | CodeCode Available | 0 |
| Cognitive Biases in Large Language Models: A Survey and Mitigation Experiments | Nov 30, 2024 | Multiple-choice | —Unverified | 0 |
| Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark | Nov 29, 2024 | BenchmarkingGrounded Video Question Answering | —Unverified | 0 |
| Applying IRT to Distinguish Between Human and Generative AI Responses to Multiple-Choice Assessments | Nov 28, 2024 | Multiple-choice | —Unverified | 0 |
| Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers | Nov 28, 2024 | Image Captioningimage-classification | —Unverified | 0 |