| FFA Sora, video generation as fundus fluorescein angiography simulator | Dec 23, 2024 | Privacy PreservingQuestion Answering | —Unverified | 0 |
| Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy | Dec 23, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective | Dec 23, 2024 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Prompting Large Language Models with Rationale Heuristics for Knowledge-based Visual Question Answering | Dec 22, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization | Dec 21, 2024 | Image CaptioningMultimodal Reasoning | CodeCode Available | 0 |
| NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization | Dec 20, 2024 | Compositional Generalization (AVG)Novel Concepts | CodeCode Available | 0 |
| Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization | Dec 19, 2024 | Contrastive LearningDecision Making | CodeCode Available | 1 |
| AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving | Dec 19, 2024 | Autonomous DrivingBenchmarking | CodeCode Available | 2 |
| Unveiling Uncertainty: A Deep Dive into Calibration and Performance of Multimodal Large Language Models | Dec 19, 2024 | Autonomous DrivingImage Captioning | CodeCode Available | 0 |
| FedPIA -- Permuting and Integrating Adapters leveraging Wasserstein Barycenters for Finetuning Foundation Models in Multi-Modal Federated Learning | Dec 19, 2024 | Federated Learningparameter-efficient fine-tuning | —Unverified | 0 |