| Localizing AI: Evaluating Open-Weight Language Models for Languages of Baltic States | Jan 7, 2025 | Machine TranslationMultiple-choice | —Unverified | 0 |
| Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation | Jan 6, 2025 | Language Model EvaluationLanguage Modeling | CodeCode Available | 1 |
| (WhyPHI) Fine-Tuning PHI-3 for Multiple-Choice Question Answering: Methodology, Results, and Challenges | Jan 3, 2025 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering | Jan 2, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| Unifying Specialized Visual Encoders for Video Language Models | Jan 2, 2025 | Multiple-choiceVideo Understanding | CodeCode Available | 1 |
| Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation | Jan 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs | Jan 1, 2025 | Multiple-choiceVideo Generation | —Unverified | 0 |
| Separation of Powers: On Segregating Knowledge from Observation in LLM-enabled Knowledge-based Visual Question Answering | Jan 1, 2025 | Multiple-choiceQuestion Answering | —Unverified | 0 |
| FSBench: A Figure Skating Benchmark for Advancing Artistic Sports Understanding | Jan 1, 2025 | Action RecognitionMultiple-choice | CodeCode Available | 0 |
| IllusionBench: A Large-scale and Comprehensive Benchmark for Visual Illusion Understanding in Vision-Language Models | Jan 1, 2025 | HallucinationMultiple-choice | —Unverified | 0 |