Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models Oct 1, 2024 Question Answering Visual Question Answering
Code Code Available 0MUTAN: Multimodal Tucker Fusion for Visual Question Answering May 18, 2017 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 0MQA: Answering the Question via Robotic Manipulation Mar 10, 2020 Imitation Learning Question Answering
Code Code Available 0Unmasked Teacher: Towards Training-Efficient Video Foundation Models Mar 28, 2023 Action Classification Action Recognition
Code Code Available 0NAAQA: A Neural Architecture for Acoustic Question Answering Jun 11, 2021 Acoustic Question Answering Question Answering
Code Code Available 0Active Learning for Visual Question Answering: An Empirical Study Nov 6, 2017 Active Learning Visual Question Answering
Code Code Available 0Modulating early visual processing by language Jul 2, 2017 Question Answering Visual Question Answering
Code Code Available 0StarVQA+: Co-training Space-Time Attention for Video Quality Assessment Jun 21, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0StarVQA: Space-Time Attention for Video Quality Assessment Aug 22, 2021 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of Text-To-Image Models Oct 3, 2023 Image Generation Visual Question Answering (VQA)
Code Code Available 0VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning Mar 5, 2023 Answer Generation Entity Alignment
Code Code Available 0End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features Jun 21, 2018 Question Answering Video Description
Code Code Available 0Modularized Zero-shot VQA with Pre-trained Models May 27, 2023 object-detection Object Detection
Code Code Available 0NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization Dec 20, 2024 Compositional Generalization (AVG) Novel Concepts
Code Code Available 0ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens Sep 28, 2023 Cross-Modal Retrieval GPU
Code Code Available 0Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering Nov 18, 2017 Form Visual Question Answering
Code Code Available 0Neural Module Networks Nov 9, 2015 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 0Effective Approaches to Batch Parallelization for Dynamic Neural Network Architectures Jul 8, 2017 Mixture-of-Experts Question Answering
Code Code Available 0Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding Oct 4, 2018 Question Answering Representation Learning
Code Code Available 0CLIPVQA:Video Quality Assessment via CLIP Jul 6, 2024 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0A Joint Sequence Fusion Model for Video Question Answering and Retrieval Aug 7, 2018 Decoder Multiple-choice
Code Code Available 0Structured Attentions for Visual Question Answering Aug 7, 2017 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 0ECG Heartbeat Classification: A Deep Transferable Representation Apr 19, 2018 Arrhythmia Detection Electrocardiography (ECG)
Code Code Available 0Structured Triplet Learning with POS-tag Guided Attention for Visual Question Answering Jan 24, 2018 Multiple-choice POS
Code Code Available 0Visual Question Answering using Deep Learning: A Survey and Performance Analysis Aug 27, 2019 Common Sense Reasoning Question Answering
Code Code Available 0Modeling Relationships in Referential Expressions with Compositional Modular Networks Nov 30, 2016 Visual Question Answering (VQA)
Code Code Available 0MIRTT: Learning Multimodal Interaction Representations from Trilinear Transformers for Visual Question Answering Nov 1, 2021 multimodal interaction Multiple-choice
Code Code Available 0Mimic and Fool: A Task Agnostic Adversarial Attack Jun 11, 2019 Adversarial Attack Image Captioning
Code Code Available 0MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding Jan 11, 2020 Image Captioning Image-text Retrieval
Code Code Available 0AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care May 1, 2025 Language Modeling Language Modelling
Code Code Available 0No Images, No Problem: Retaining Knowledge in Continual VQA with Questions-Only Memory Feb 6, 2025 Continual Learning Question Answering
Code Code Available 0Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning Mar 6, 2020 Density Estimation Noise Estimation
Code Code Available 0Noise-Induced Barren Plateaus in Variational Quantum Algorithms Jul 28, 2020 Visual Question Answering (VQA)
Code Code Available 0Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models Mar 3, 2025 Memorization Question Answering
Code Code Available 0No-Reference Video Quality Assessment Based on Benford’s Law and Perceptual Features Nov 12, 2021 No-Reference Image Quality Assessment Video Quality Assessment
Code Code Available 0ClinKD: Cross-Modal Clinical Knowledge Distiller For Multi-Task Medical Images Feb 9, 2025 Clinical Knowledge Medical Visual Question Answering
Code Code Available 0No-Reference Video Quality Assessment Using Space-Time Chips Aug 23, 2020 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Study on the Assessment of the Quality of Experience of Streaming Video Dec 8, 2020 regression Video Quality Assessment
Code Code Available 0Subjective and Objective Analysis of Indian Social Media Video Quality Jan 5, 2024 Mixture-of-Experts Visual Question Answering (VQA)
Code Code Available 0USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions Feb 15, 2025 Multimodal Reasoning Visual Question Answering (VQA)
Code Code Available 0Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm Aug 16, 2024 Decision Making Medical Visual Question Answering
Code Code Available 0Subjective and Objective Audio-Visual Quality Assessment for User Generated Content Jul 10, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Subjective and Objective Quality Assessment of High-Motion Sports Videos at Low-Bitrates Jul 12, 2022 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Medical Large Vision Language Models with Multi-Image Visual Ability May 25, 2025 Question Answering Visual Question Answering (VQA)
Code Code Available 0Visual Question Answering: which investigated applications? Mar 4, 2021 Image Captioning Question Answering
Code Code Available 0MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models Feb 28, 2025 Decision Making Hallucination
Code Code Available 0NTIRE 2025 Challenge on Video Quality Enhancement for Video Conferencing: Datasets, Methods and Results May 25, 2025 valid Video Quality Assessment
Code Code Available 0Measuring Faithful and Plausible Visual Grounding in VQA May 24, 2023 Question Answering Visual Grounding
Code Code Available 0μ-Bench: A Vision-Language Benchmark for Microscopy Understanding Jul 1, 2024 Cell Detection Classification
Code Code Available 0Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding Mar 18, 2025 document understanding Question Answering
Code Code Available 0