OmniFusion Technical Report Apr 9, 2024 MM-Vet TextVQA
Code Code Available 05 OmniNet: A unified architecture for multi-modal multi-task learning Jul 17, 2019 Image Captioning Multi-Task Learning
Code Code Available 05 Open-Ended Multi-Modal Relational Reasoning for Video Question Answering Dec 1, 2020 Question Answering Relational Reasoning
Code Code Available 05 BioD2C: A Dual-level Semantic Consistency Constraint Framework for Biomedical VQA Mar 4, 2025 Medical Diagnosis Question Answering
Code Code Available 05 Object Attribute Matters in Visual Question Answering Dec 20, 2023 Attribute Graph Neural Network
Code Code Available 05 DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness Nov 29, 2024 Optical Character Recognition (OCR) Question Answering
Code Code Available 05 OG-SGG: Ontology-Guided Scene Graph Generation. A Case Study in Transfer Learning for Telepresence Robotics Feb 21, 2022 BIG-bench Machine Learning Graph Generation
Code Code Available 05 Visuo-Linguistic Question Answering (VLQA) Challenge May 1, 2020 Question Answering Reading Comprehension
Code Code Available 05 NTIRE 2025 Challenge on Video Quality Enhancement for Video Conferencing: Datasets, Methods and Results May 25, 2025 valid Video Quality Assessment
Code Code Available 05 BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models Jan 28, 2023 Out-of-Distribution Generalization Question Answering
Code Code Available 05 DisCoVQA: Temporal Distortion-Content Transformers for Video Quality Assessment Jun 20, 2022 Time Series Analysis Video Quality Assessment
Code Code Available 05 Towards Flexible Evaluation for Generative Visual Question Answering Aug 1, 2024 Decoder Generative Visual Question Answering
Code Code Available 05 Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering Sep 13, 2021 Data Augmentation Question Answering
Code Code Available 05 No-Reference Video Quality Assessment Based on Benford’s Law and Perceptual Features Nov 12, 2021 No-Reference Image Quality Assessment Video Quality Assessment
Code Code Available 05 Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning Mar 6, 2020 Density Estimation Noise Estimation
Code Code Available 05 Noise-Induced Barren Plateaus in Variational Quantum Algorithms Jul 28, 2020 Visual Question Answering (VQA)
Code Code Available 05 No-Reference Video Quality Assessment Using Space-Time Chips Aug 23, 2020 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 05 Open-Ended Visual Question-Answering Oct 9, 2016 Question Answering Sentence
Code Code Available 05 Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following Jun 4, 2024 Question Answering Visual Question Answering
Code Code Available 05 Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs Oct 15, 2024 Image Description Multiple-choice
Code Code Available 05 Differential Attention for Visual Question Answering Apr 1, 2018 Question Answering Visual Question Answering
Code Code Available 05 Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis Feb 11, 2023 Image-text Retrieval Knowledge Graphs
Code Code Available 05 Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model Jun 15, 2024 Question Answering Video Understanding
Code Code Available 05 Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding Oct 4, 2018 Question Answering Representation Learning
Code Code Available 05 Neural Module Networks Nov 9, 2015 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 05 Did the Model Understand the Question? May 14, 2018 model Question Answering
Code Code Available 05 Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data Jul 24, 2020 Visual Dialog Visual Question Answering (VQA)
Code Code Available 05 Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models May 26, 2025 image-classification Image Classification
Code Code Available 05 Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering Aug 10, 2017 Question Answering Visual Question Answering
Code Code Available 05 Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking Oct 11, 2021 Benchmarking Question Answering
Code Code Available 05 BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis Aug 10, 2021 Language Modeling Language Modelling
Code Code Available 05 Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training Mar 30, 2024 Contrastive Learning Question Answering
Code Code Available 05 NAAQA: A Neural Architecture for Acoustic Question Answering Jun 11, 2021 Acoustic Question Answering Question Answering
Code Code Available 05 Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of Text-To-Image Models Oct 3, 2023 Image Generation Visual Question Answering (VQA)
Code Code Available 05 NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization Dec 20, 2024 Compositional Generalization (AVG) Novel Concepts
Code Code Available 05 A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering Oct 1, 2022 Medical Visual Question Answering Question Answering
Code Code Available 05 Multiview Contrastive Learning for Completely Blind Video Quality Assessment of User Generated Content Jul 13, 2022 Contrastive Learning Optical Flow Estimation
Code Code Available 05 Delving Deeper into Cross-lingual Visual Question Answering Feb 15, 2022 Inductive Bias Question Answering
Code Code Available 05 Multi-Target Embodied Question Answering Apr 9, 2019 Embodied Question Answering Navigate
Code Code Available 05 MUREL: Multimodal Relational Reasoning for Visual Question Answering Feb 25, 2019 Relational Reasoning Visual Question Answering
Code Code Available 05 Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling Feb 20, 2025 Decoder GPU
Code Code Available 05 Deep Modular Co-Attention Networks for Visual Question Answering Jun 25, 2019 Question Answering Visual Question Answering
Code Code Available 05 Multi-Sourced Compositional Generalization in Visual Question Answering May 29, 2025 Question Answering Visual Question Answering
Code Code Available 05 MUTAN: Multimodal Tucker Fusion for Visual Question Answering May 18, 2017 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 05 Multimodal Residual Learning for Visual QA Jun 5, 2016 Multiple-choice Question Answering
Code Code Available 05 Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model Jul 31, 2024 Benchmarking Large Language Model
Code Code Available 05 Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism Apr 29, 2024 document understanding GPU
Code Code Available 05 Multimodal Hypothetical Summary for Retrieval-based Multi-image Question Answering Dec 19, 2024 Contrastive Learning Language Modeling
Code Code Available 05 Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering Sep 23, 2020 Question Answering Visual Question Answering
Code Code Available 05 No Images, No Problem: Retaining Knowledge in Continual VQA with Questions-Only Memory Feb 6, 2025 Continual Learning Question Answering
Code Code Available 05