Document Visual Question Answering Challenge 2020 Aug 20, 2020 Question Answering Retrieval
— Unverified 00 Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think! Oct 13, 2020 Diagnostic Image-text Classification
— Unverified 00 Do Explanations make VQA Models more Predictable to a Human? Oct 29, 2018 Question Answering Visual Question Answering
— Unverified 00 Domain-robust VQA with diverse datasets and methods but no target labels Mar 29, 2021 Domain Adaptation Object Recognition
— Unverified 00 DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment Jul 1, 2023 Language Modeling Language Modelling
— Unverified 00 D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions Jul 2, 2024 Diagnostic Instruction Following
— Unverified 00 Dual Capsule Attention Mask Network with Mutual Learning for Visual Question Answering Oct 1, 2022 Question Answering Visual Question Answering
— Unverified 00 DualNet: Domain-Invariant Network for Visual Question Answering Jun 20, 2016 Question Answering Visual Question Answering
— Unverified 00 DUBLIN -- Document Understanding By Language-Image Network May 23, 2023 Document Classification document understanding
— Unverified 00 DVLTA-VQA: Decoupled Vision-Language Modeling with Text-Guided Adaptation for Blind Video Quality Assessment Apr 16, 2025 Language Modeling Language Modelling
— Unverified 00 Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering Dec 13, 2018 Question Answering Visual Question Answering
— Unverified 00 Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering Jun 1, 2019 Question Answering Visual Question Answering
— Unverified 00 Dynamic Inference With Grounding Based Vision and Language Models Jan 1, 2023 Language Modelling Referring Expression
— Unverified 00 DynRsl-VLM: Enhancing Autonomous Driving Perception with Dynamic Resolution Vision-Language Models Mar 14, 2025 Autonomous Driving Computational Efficiency
— Unverified 00 eaVQA: An Experimental Analysis on Visual Question Answering Models Dec 1, 2021 Question Answering Visual Question Answering
— Unverified 00 EBMs vs. CL: Exploring Self-Supervised Visual Pretraining for Visual Question Answering Jun 29, 2022 Contrastive Learning Out of Distribution (OOD) Detection
— Unverified 00 EchoSight: Advancing Visual-Language Models with Wiki Knowledge Jul 17, 2024 Articles Question Answering
— Unverified 00 Edit me: A Corpus and a Framework for Understanding Natural Language Image Editing May 1, 2018 Image Captioning Question Answering
— Unverified 00 Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering Oct 28, 2024 Computational Efficiency Decision Making
— Unverified 00 Efficient Few-Shot Continual Learning in Vision-Language Models Feb 6, 2025 Continual Learning Image Captioning
— Unverified 00 Efficient Quantum Gradient and Higher-order Derivative Estimation via Generalized Hadamard Test Aug 10, 2024 Visual Question Answering (VQA)
— Unverified 00 ElectroVizQA: How well do Multi-modal LLMs perform in Electronics Visual Question Answering? Nov 27, 2024 Question Answering Visual Question Answering
— Unverified 00 Eliminating Catastrophic Interference with Biased Competition Jul 3, 2020 Question Answering Visual Question Answering
— Unverified 00 Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention Oct 14, 2024 Contrastive Learning counterfactual
— Unverified 00 Embodied Scene Understanding for Vision Language Models via MetaVQA Jan 15, 2025 Decision Making Question Answering
— Unverified 00 EmoAssist: Emotional Assistant for Visual Impairment Community Feb 13, 2025 Emotional Intelligence Question Answering
— Unverified 00 Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation Nov 16, 2021 Image Captioning Knowledge Distillation
— Unverified 00 Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation Mar 12, 2022 Image Captioning Knowledge Distillation
— Unverified 00 Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories Jun 15, 2023 Question Answering Retrieval
— Unverified 00 Enforcing Reasoning in Visual Commonsense Reasoning Oct 21, 2019 Question Answering Reinforcement Learning
— Unverified 00 Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering Dec 30, 2024 Image Captioning Object Recognition
— Unverified 00 Enhanced Textual Feature Extraction for Visual Question Answering: A Simple Convolutional Approach May 1, 2024 Computational Efficiency Question Answering
— Unverified 00 Enhancing Financial VQA in Vision Language Models using Intermediate Structured Representations Jan 8, 2025 Visual Question Answering (VQA)
— Unverified 00 Enhancing Generalization in Medical Visual Question Answering Tasks via Gradient-Guided Model Perturbation Mar 5, 2024 Data Augmentation Medical Visual Question Answering
— Unverified 00 Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy Nov 23, 2024 Instruction Following MME
— Unverified 00 Enhancing Multi-hop Reasoning in Vision-Language Models via Self-Distillation with Multi-Prompt Ensembling Mar 3, 2025 Answer Generation Computational Efficiency
— Unverified 00 Enhancing Multi-Image Question Answering via Submodular Subset Selection May 15, 2025 Question Answering Retrieval
— Unverified 00 Enhancing Self-Consistency and Performance of Pre-Trained Language Models through Natural Language Inference Nov 21, 2022 Natural Language Inference Question Answering
— Unverified 00 Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion Aug 14, 2024 Question Answering Visual Question Answering
— Unverified 00 Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering Oct 18, 2022 Passage Retrieval Question Answering
— Unverified 00 Erasure for Advancing: Dynamic Self-Supervised Learning for Commonsense Reasoning Jan 1, 2021 Question Answering Self-Supervised Learning
— Unverified 00 ErgoChat: a Visual Query System for the Ergonomic Risk Assessment of Construction Workers Dec 27, 2024 Image Captioning Question Answering
— Unverified 00 ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation Nov 9, 2022 Contrastive Learning Decoder
— Unverified 00 ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph Jun 30, 2020 Attribute Prediction
— Unverified 00 Estimating semantic structure for the VQA answer space Jun 10, 2020 General Classification Question Answering
— Unverified 00 ESVQA: Perceptual Quality Assessment of Egocentric Spatial Videos Dec 29, 2024 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 00 Evaluating Attribute Confusion in Fashion Text-to-Image Generation Jul 9, 2025 Attribute cross-modal alignment
— Unverified 00 Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data Jun 1, 2023 Anomaly Detection Image Generation
— Unverified 00 Evaluating the Representational Hub of Language and Vision Models Apr 12, 2019 Diagnostic Question Answering
— Unverified 00 Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks May 29, 2024 Question Answering Visual Question Answering
— Unverified 00