| Probing Visual Language Priors in VLMs | Dec 31, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models | Dec 31, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| UniRS: Unifying Multi-temporal Remote Sensing Tasks through Vision Language Models | Dec 30, 2024 | Question AnsweringScene Classification | CodeCode Available | 0 |
| Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering | Dec 30, 2024 | Image CaptioningObject Recognition | —Unverified | 0 |
| HALLUCINOGEN: A Benchmark for Evaluating Object Hallucination in Large Visual-Language Models | Dec 29, 2024 | HallucinationObject | CodeCode Available | 0 |
| ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers | Dec 27, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 |
| TextMatch: Enhancing Image-Text Consistency Through Multimodal Optimization | Dec 24, 2024 | In-Context LearningQuestion Answering | —Unverified | 0 |
| LININ: Logic Integrated Neural Inference Network for Explanatory Visual Question Answering | Dec 24, 2024 | Explanatory Visual Question AnsweringMultimodal Reasoning | CodeCode Available | 0 |
| Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering | Dec 24, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective | Dec 23, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| FFA Sora, video generation as fundus fluorescein angiography simulator | Dec 23, 2024 | Privacy PreservingQuestion Answering | —Unverified | 0 |
| Multimodal Preference Data Synthetic Alignment with Reward Model | Dec 23, 2024 | 2kCaption Generation | CodeCode Available | 0 |
| Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy | Dec 23, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Prompting Large Language Models with Rationale Heuristics for Knowledge-based Visual Question Answering | Dec 22, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization | Dec 21, 2024 | Image CaptioningMultimodal Reasoning | CodeCode Available | 0 |
| NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization | Dec 20, 2024 | Compositional Generalization (AVG)Novel Concepts | CodeCode Available | 0 |
| Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models | Dec 19, 2024 | Autonomous DrivingImage Captioning | CodeCode Available | 0 |
| FedPIA -- Permuting and Integrating Adapters leveraging Wasserstein Barycenters for Finetuning Foundation Models in Multi-Modal Federated Learning | Dec 19, 2024 | Federated Learningparameter-efficient fine-tuning | —Unverified | 0 |
| Consistency of Compositional Generalization across Multiple Levels | Dec 18, 2024 | Meta-LearningQuestion Answering | CodeCode Available | 0 |
| A Concept-Centric Approach to Multi-Modality Learning | Dec 18, 2024 | Image-text matchingQuestion Answering | —Unverified | 0 |
| Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues | Dec 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology | Dec 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| LLaVA Steering: Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering | Dec 16, 2024 | In-Context LearningInstruction Following | CodeCode Available | 0 |
| Overview of TREC 2024 Medical Video Question Answering (MedVidQA) Track | Dec 15, 2024 | Image CaptioningMedical Question Answering | —Unverified | 0 |
| Damage Assessment after Natural Disasters with UAVs: Semantic Feature Extraction using Deep Learning | Dec 14, 2024 | Decision MakingQuestion Answering | —Unverified | 0 |