Analyzing Modular Approaches for Visual Question Decomposition Nov 10, 2023 Code Generation Visual Question Answering (VQA)
Code Code Available 05 OmniNet: A unified architecture for multi-modal multi-task learning Jul 17, 2019 Image Captioning Multi-Task Learning
Code Code Available 05 Robustness through Data Augmentation Loss Consistency Oct 21, 2021 Multi-domain Dialogue State Tracking Visual Question Answering
Code Code Available 05 D3: Data Diversity Design for Systematic Generalization in Visual Question Answering Sep 15, 2023 Diversity Question Answering
Code Code Available 05 BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data Oct 1, 2024 Code Generation Logical Reasoning
Code Code Available 05 No Images, No Problem: Retaining Knowledge in Continual VQA with Questions-Only Memory Feb 6, 2025 Continual Learning Question Answering
Code Code Available 05 Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning Mar 6, 2020 Density Estimation Noise Estimation
Code Code Available 05 cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation Jun 7, 2022 Knowledge Distillation Question Answering
Code Code Available 05 Neural Module Networks Nov 9, 2015 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 05 AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss May 5, 2021 Question Answering Visual Question Answering
Code Code Available 05 Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding Oct 4, 2018 Question Answering Representation Learning
Code Code Available 05 Noise-Induced Barren Plateaus in Variational Quantum Algorithms Jul 28, 2020 Visual Question Answering (VQA)
Code Code Available 05 NAAQA: A Neural Architecture for Acoustic Question Answering Jun 11, 2021 Acoustic Question Answering Question Answering
Code Code Available 05 Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of Text-To-Image Models Oct 3, 2023 Image Generation Visual Question Answering (VQA)
Code Code Available 05 Multiview Contrastive Learning for Completely Blind Video Quality Assessment of User Generated Content Jul 13, 2022 Contrastive Learning Optical Flow Estimation
Code Code Available 05 12-in-1: Multi-Task Vision and Language Representation Learning Dec 5, 2019 10-shot image generation Image Retrieval
Code Code Available 05 MUREL: Multimodal Relational Reasoning for Visual Question Answering Feb 25, 2019 Relational Reasoning Visual Question Answering
Code Code Available 05 Cross-Modal Transferable Image-to-Video Attack on Video Quality Metrics Jan 14, 2025 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 05 Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA Mar 17, 2021 Question Answering Relational Reasoning
Code Code Available 05 MUTAN: Multimodal Tucker Fusion for Visual Question Answering May 18, 2017 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 05 NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization Dec 20, 2024 Compositional Generalization (AVG) Novel Concepts
Code Code Available 05 No-Reference Video Quality Assessment Based on Benford’s Law and Perceptual Features Nov 12, 2021 No-Reference Image Quality Assessment Video Quality Assessment
Code Code Available 05 Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism Apr 29, 2024 document understanding GPU
Code Code Available 05 Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering Sep 23, 2020 Question Answering Visual Question Answering
Code Code Available 05 Cross-Modal Contrastive Learning for Robust Reasoning in VQA Nov 21, 2022 Contrastive Learning Question Answering
Code Code Available 05 AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? Oct 28, 2024 Benchmarking Question Answering
Code Code Available 05 Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective Dec 23, 2024 Question Answering Visual Question Answering
Code Code Available 05 A Unified Hallucination Mitigation Framework for Large Vision-Language Models Sep 24, 2024 Hallucination Question Answering
Code Code Available 05 Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling Feb 20, 2025 Decoder GPU
Code Code Available 05 CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering Nov 7, 2022 Add - PO Add - PQ
Code Code Available 05 Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach Jan 31, 2020 Question Answering Visual Question Answering
Code Code Available 05 Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering Dec 19, 2024 Contrastive Learning Language Modeling
Code Code Available 05 Multimodal Residual Learning for Visual QA Jun 5, 2016 Multiple-choice Question Answering
Code Code Available 05 Multi-Sourced Compositional Generalization in Visual Question Answering May 29, 2025 Question Answering Visual Question Answering
Code Code Available 05 A Dataset and Architecture for Visual Reasoning with a Working Memory Mar 16, 2018 Diagnostic Logical Reasoning
Code Code Available 05 Counting Everyday Objects in Everyday Scenes Apr 12, 2016 Object Object Counting
Code Code Available 05 Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Jun 6, 2016 Phrase Grounding Visual Grounding
Code Code Available 05 Attribute Diversity Determines the Systematicity Gap in VQA Nov 15, 2023 Attribute Diagnostic
Code Code Available 05 Multimodal Explanations: Justifying Decisions and Pointing to the Evidence Feb 15, 2018 Activity Recognition Explainable Models
Code Code Available 05 Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question Answering Aug 4, 2017 Question Answering Visual Question Answering
Code Code Available 05 Adaptive Score Alignment Learning for Continual Perceptual Quality Assessment of 360-Degree Videos in Virtual Reality Feb 27, 2025 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 05 Copy-Move Forgery Detection and Question Answering for Remote Sensing Image Dec 3, 2024 Question Answering Visual Question Answering
Code Code Available 05 Attention on Attention: Architectures for Visual Question Answering (VQA) Mar 21, 2018 GPU Question Answering
Code Code Available 05 Convincing Rationales for Visual Question Answering Reasoning Feb 6, 2024 Question Answering Visual Question Answering
Code Code Available 05 Multi-Image Visual Question Answering Dec 27, 2021 Question Answering Visual Question Answering
Code Code Available 05 Contrastive Visual-Linguistic Pretraining Jul 26, 2020 Contrastive Learning regression
Code Code Available 05 Continual VQA for Disaster Response Systems Sep 21, 2022 Disaster Response Management
Code Code Available 05 Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering Jul 28, 2023 Question Answering Visual Question Answering
Code Code Available 05 Multi-Target Embodied Question Answering Apr 9, 2019 Embodied Question Answering Navigate
Code Code Available 05 On Modality Bias in the TVQA Dataset Dec 18, 2020 Question Answering Video Question Answering
Code Code Available 05