| Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues | Dec 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering | Dec 16, 2024 | In-Context LearningInstruction Following | CodeCode Available | 0 |
| CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology | Dec 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track | Dec 15, 2024 | Image CaptioningMedical Question Answering | —Unverified | 0 |
| Patch-level Sounding Object Tracking for Audio-Visual Question Answering | Dec 14, 2024 | Audio-visual Question AnsweringObject Tracking | —Unverified | 0 |
| Damage Assessment after Natural Disasters with UAVs: Semantic Feature Extraction using Deep Learning | Dec 14, 2024 | Decision MakingQuestion Answering | —Unverified | 0 |
| VLR-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation | Dec 13, 2024 | Instruction FollowingQuestion Answering | —Unverified | 0 |
| ViUniT: Visual Unit Tests for More Robust Visual Programming | Dec 12, 2024 | Image GenerationImage-text matching | —Unverified | 0 |
| Discrete Subgraph Sampling for Interpretable Graph based Visual Question Answering | Dec 11, 2024 | Explainable artificial intelligenceExplainable Artificial Intelligence (XAI) | CodeCode Available | 0 |
| Illusory VQA: Benchmarking and Enhancing Multimodal Models on Visual Illusions | Dec 11, 2024 | BenchmarkingQuestion Answering | CodeCode Available | 0 |