Abduction of Domain Relationships from Data for VQA Feb 13, 2025 Question Answering Visual Question Answering
— Unverified 0Visual Graph Question Answering with ASP and LLMs for Language Parsing Feb 13, 2025 Graph Question Answering Optical Character Recognition
— Unverified 0ClinKD: Cross-Modal Clinical Knowledge Distiller For Multi-Task Medical Images Feb 9, 2025 Clinical Knowledge Medical Visual Question Answering
Code Code Available 0Performance Analysis of Traditional VQA Models Under Limited Computational Resources Feb 9, 2025 Question Answering Visual Question Answering
— Unverified 0Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment Feb 7, 2025 Diversity Human-Object Interaction Detection
— Unverified 0HD-EPIC: A Highly-Detailed Egocentric Video Dataset Feb 6, 2025 Action Recognition Nutrition
— Unverified 0No Images, No Problem: Retaining Knowledge in Continual VQA with Questions-Only Memory Feb 6, 2025 Continual Learning Question Answering
Code Code Available 0Content-Rich AIGC Video Quality Assessment via Intricate Text Alignment and Motion-Aware Consistency Feb 6, 2025 Video Generation Video Quality Assessment
Code Code Available 1Efficient Few-Shot Continual Learning in Vision-Language Models Feb 6, 2025 Continual Learning Image Captioning
— Unverified 0Variational Quantum Optimization with Continuous Bandits Feb 6, 2025 Visual Question Answering (VQA)
Code Code Available 0Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models Feb 3, 2025 Adversarial Robustness Image Captioning
Code Code Available 1VLM-Assisted Continual learning for Visual Question Answering in Self-Driving Feb 2, 2025 Autonomous Driving Continual Learning
— Unverified 0Hypo3D: Exploring Hypothetical Reasoning in 3D Feb 2, 2025 Question Answering Visual Question Answering
— Unverified 0Large Models in Dialogue for Active Perception and Anomaly Detection Jan 27, 2025 Anomaly Detection Question Answering
Code Code Available 0Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis Jan 26, 2025 Articles Hallucination
— Unverified 0Scene Understanding Enabled Semantic Communication with Open Channel Coding Jan 24, 2025 Question Answering Scene Understanding
— Unverified 0Combining Knowledge Graph and LLMs for Enhanced Zero-shot Visual Question Answering Jan 22, 2025 Knowledge Graphs Question Answering
— Unverified 0Patent Figure Classification using Large Vision-language Models Jan 22, 2025 Classification Few-Shot Learning
Code Code Available 0Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No! Jan 18, 2025 Multiple-choice Question Answering
— Unverified 0Embodied Scene Understanding for Vision Language Models via MetaVQA Jan 15, 2025 Decision Making Question Answering
— Unverified 0Cross-Modal Transferable Image-to-Video Attack on Video Quality Metrics Jan 14, 2025 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Video Quality Assessment for Online Processing: From Spatial to Temporal Sampling Jan 13, 2025 Video Quality Assessment Video Understanding
— Unverified 0The Quest for Visual Understanding: A Journey Through the Evolution of Visual Question Answering Jan 13, 2025 Common Sense Reasoning Question Answering
— Unverified 0Overcoming Language Priors for Visual Question Answering Based on Knowledge Distillation Jan 10, 2025 Knowledge Distillation Question Answering
— Unverified 0ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding Jan 9, 2025 Visual Question Answering (VQA) Visual Reasoning
Code Code Available 2Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning Jan 9, 2025 Benchmarking Question Answering
— Unverified 0Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations Jan 8, 2025 Visual Question Answering (VQA)
— Unverified 0Visual question answering: from early developments to recent advances -- a survey Jan 7, 2025 Descriptive Natural Language Understanding
— Unverified 0LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Jan 7, 2025 GPU Visual Question Answering (VQA)
Code Code Available 4ReDiT: Re‑evaluating large visual question answering model confidence by defining input scenario Difficulty and applying Temperature mapping Jan 6, 2025 Question Answering Visual Question Answering
Code Code Available 0Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation Jan 6, 2025 Language Model Evaluation Language Modeling
Code Code Available 1Multilevel Semantic-Aware Model for AI-Generated Video Quality Assessment Jan 6, 2025 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs? Jan 5, 2025 Image Captioning Image to text
Code Code Available 1Accounting for Focus Ambiguity in Visual Questions Jan 4, 2025 Question Answering Visual Question Answering
— Unverified 0Guiding Medical Vision-Language Models with Explicit Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations Jan 4, 2025 Decoder Visual Question Answering (VQA)
— Unverified 0Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models Jan 3, 2025 Binary Classification Face Anti-Spoofing
— Unverified 0MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning Jan 3, 2025 Diagnostic General Knowledge
— Unverified 0CLIP-UP: CLIP-Based Unanswerable Problem Detection for Visual Question Answering Jan 2, 2025 Multiple-choice Question Answering
— Unverified 0V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts Jan 1, 2025 Contrastive Learning Text Retrieval
— Unverified 0F^3OCUS - Federated Finetuning of Vision-Language Foundation Models with Optimal Client Layer Updating Strategy via Multi-objective Meta-Heuristics Jan 1, 2025 Diversity Federated Learning
— Unverified 0JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems Jan 1, 2025 Question Answering Visual Question Answering
— Unverified 0Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering Jan 1, 2025 Contrastive Learning Medical Visual Question Answering
— Unverified 0Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering Jan 1, 2025 Large Language Model Multimodal Large Language Model
Code Code Available 1Probing Visual Language Priors in VLMs Dec 31, 2024 Question Answering Visual Question Answering
— Unverified 0Social-LLaVA: Enhancing Robot Navigation through Human-Language Reasoning in Social Spaces Dec 30, 2024 2k Robot Navigation
— Unverified 0Investigating layer-selective transfer learning of QAOA parameters for Max-Cut problem Dec 30, 2024 Combinatorial Optimization Transfer Learning
— Unverified 0Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering Dec 30, 2024 Image Captioning Object Recognition
— Unverified 0ESVQA: Perceptual Quality Assessment of Egocentric Spatial Videos Dec 29, 2024 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0HALLUCINOGEN: A Benchmark for Evaluating Object Hallucination in Large Visual-Language Models Dec 29, 2024 Hallucination Object
Code Code Available 0Not all Views are Created Equal: Analyzing Viewpoint Instabilities in Vision Foundation Models Dec 27, 2024 3D Reconstruction All
— Unverified 0