Multimodal Explanations: Justifying Decisions and Pointing to the Evidence Feb 15, 2018 Activity Recognition Explainable Models
Code Code Available 05 MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding Jan 11, 2020 Image Captioning Image-text Retrieval
Code Code Available 05 Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations Mar 5, 2025 Question Answering Visual Question Answering
Code Code Available 05 Enhancing the AI2 Diagrams Dataset Using Rhetorical Structure Theory May 1, 2018 Question Answering Visual Question Answering (VQA)
Code Code Available 05 Medical Large Vision Language Models with Multi-Image Visual Ability May 25, 2025 Question Answering Visual Question Answering (VQA)
Code Code Available 05 Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm Aug 16, 2024 Decision Making Medical Visual Question Answering
Code Code Available 05 Mimic and Fool: A Task Agnostic Adversarial Attack Jun 11, 2019 Adversarial Attack Image Captioning
Code Code Available 05 Measuring Faithful and Plausible Visual Grounding in VQA May 24, 2023 Question Answering Visual Grounding
Code Code Available 05 Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation Jun 27, 2024 Continual Learning Question Answering
Code Code Available 05 μ-Bench: A Vision-Language Benchmark for Microscopy Understanding Jul 1, 2024 Cell Detection Classification
Code Code Available 05 Answer Them All! Toward Universal Visual Question Answering Models Mar 1, 2019 All Question Answering
Code Code Available 05 MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models Feb 28, 2025 Decision Making Hallucination
Code Code Available 05 End-to-end optimization of goal-driven and visually grounded dialogue systems Mar 15, 2017 Decoder Deep Reinforcement Learning
Code Code Available 05 End-to-End Instance Segmentation with Recurrent Attention May 30, 2016 Autonomous Driving Image Captioning
Code Code Available 05 Answer Questions with Right Image Regions: A Visual Attention Regularization Approach Feb 3, 2021 Question Answering Visual Grounding
Code Code Available 05 End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features Jun 21, 2018 Question Answering Video Description
Code Code Available 05 Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering Mar 14, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 05 MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks Mar 29, 2023 Cross-Modal Retrieval Decoder
Code Code Available 05 M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge Base Dec 16, 2023 cross-modal alignment Knowledge Graphs
Code Code Available 05 Visual Contexts Clarify Ambiguous Expressions: A Benchmark Dataset Nov 21, 2024 Question Answering Visual Grounding
Code Code Available 05 ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens Sep 28, 2023 Cross-Modal Retrieval GPU
Code Code Available 05 Answering Questions about Data Visualizations using Efficient Bimodal Fusion Aug 5, 2019 Chart Question Answering Optical Character Recognition
Code Code Available 05 LXMERT Model Compression for Visual Question Answering Oct 23, 2023 model Model Compression
Code Code Available 05 LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking Apr 18, 2022 cross-modal alignment Document AI
Code Code Available 05 Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering Dec 2, 2016 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 05 Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding Mar 18, 2025 document understanding Question Answering
Code Code Available 05 Logical Implications for Visual Question Answering Consistency Mar 16, 2023 Language Modeling Language Modelling
Code Code Available 05 Locally Smoothed Neural Networks Nov 22, 2017 Face Verification Question Answering
Code Code Available 05 Effective Approaches to Batch Parallelization for Dynamic Neural Network Architectures Jul 8, 2017 Mixture-of-Experts Question Answering
Code Code Available 05 Looking Beyond Visible Cues: Implicit Video Question Answering via Dual-Clue Reasoning Jun 9, 2025 Future prediction Question Answering
Code Code Available 05 Visuo-Linguistic Question Answering (VLQA) Challenge May 1, 2020 Question Answering Reading Comprehension
Code Code Available 05 Visual Question Answering: A Survey of Methods and Datasets Jul 20, 2016 General Knowledge Survey
Code Code Available 05 LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery Feb 26, 2024 Continual Learning Exemplar-Free
Code Code Available 05 Learning content and context with language bias for Visual Question Answering Dec 21, 2020 Question Answering Visual Question Answering
Code Code Available 05 Learning Convolutional Text Representations for Visual Question Answering May 18, 2017 General Classification image-classification
Code Code Available 05 Bridging Languages through Images with Deep Partial Canonical Correlation Analysis Jul 1, 2018 Image Description Image Retrieval
Code Code Available 05 ECG Heartbeat Classification: A Deep Transferable Representation Apr 19, 2018 Arrhythmia Detection Electrocardiography (ECG)
Code Code Available 05 Bridge the Gap Between VQA and Human Behavior on Omnidirectional Video: A Large-Scale Dataset and a Deep Learning Model Jul 29, 2018 Visual Question Answering (VQA)
Code Code Available 05 BioD2C: A Dual-level Semantic Consistency Constraint Framework for Biomedical VQA Mar 4, 2025 Medical Diagnosis Question Answering
Code Code Available 05 Learning Representations of Sets through Optimized Permutations Dec 10, 2018 General Classification Question Answering
Code Code Available 05 LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models Aug 26, 2024 Large Language Model Video Quality Assessment
Code Code Available 05 Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View Oct 30, 2020 Face Recognition image-classification
Code Code Available 05 EaSe: A Diagnostic Tool for VQA based on Answer Diversity Jun 1, 2021 Diagnostic Diversity
Code Code Available 05 Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery Oct 29, 2023 Deep Learning Multimodal Deep Learning
Code Code Available 05 LININ: Logic Integrated Neural Inference Network for Explanatory Visual Question Answering Dec 24, 2024 Explanatory Visual Question Answering Multimodal Reasoning
Code Code Available 05 Dynamic Memory Networks for Visual and Textual Question Answering Mar 4, 2016 Question Answering Visual Question Answering
Code Code Available 05 Targeted Visual Prompting for Medical Visual Question Answering Aug 6, 2024 Medical Visual Question Answering Question Answering
Code Code Available 05 LLaVA-OneVision: Easy Visual Task Transfer Aug 6, 2024 3D Question Answering (3D-QA)
Code Code Available 05 Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning Oct 4, 2022 Image Captioning Sentence
Code Code Available 05 LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering May 29, 2021 Question Answering Visual Question Answering
Code Code Available 05