MoColl: Agent-Based Specific and General Model Collaboration for Image Captioning Jan 3, 2025 Diagnostic General Knowledge
— Unverified 00 Modeling Coreference Relations in Visual Dialog Mar 6, 2022 Question Answering Visual Dialog
— Unverified 00 Modern Question Answering Datasets and Benchmarks: A Survey Jun 30, 2022 Deep Learning Question Answering
— Unverified 00 Modular Graph Attention Network for Complex Visual Relational Reasoning Nov 22, 2020 Graph Attention Question Answering
— Unverified 00 Modulated Self-attention Convolutional Network for VQA Oct 8, 2019 Question Answering Visual Question Answering
— Unverified 00 MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering Mar 8, 2025 Answer Generation Mixture-of-Experts
— Unverified 00 Motion-Appearance Co-Memory Networks for Video Question Answering Mar 29, 2018 Question Answering Video Question Answering
— Unverified 00 MPDrive: Improving Spatial Understanding with Marker-Based Prompt Learning for Autonomous Driving Apr 1, 2025 Autonomous Driving Prompt Learning
— Unverified 00 mR^2AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA Nov 22, 2024 RAG Retrieval
— Unverified 00 MRET: Multi-resolution Transformer for Video Quality Assessment Mar 13, 2023 Video Quality Assessment Video Recognition
— Unverified 00 Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering Jun 16, 2020 Question Answering Visual Question Answering
— Unverified 00 Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA Jan 29, 2024 Benchmarking Image Comprehension
— Unverified 00 Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering Dec 24, 2024 Question Answering Visual Question Answering
— Unverified 00 Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes Jun 4, 2023 Common Sense Reasoning Question Answering
— Unverified 00 Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering Dec 20, 2023 Question Answering Visual Question Answering
— Unverified 00 MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling Mar 10, 2023 Multi-Label Classification MUlTI-LABEL-ClASSIFICATION
— Unverified 00 Multi-grained Attention with Object-level Grounding for Visual Question Answering Jul 1, 2019 Object Question Answering
— Unverified 00 Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering Jan 3, 2020 Question Answering Video Description
— Unverified 00 Multi-Level Attention Networks for Visual Question Answering Jul 1, 2017 Question Answering Visual Question Answering
— Unverified 00 Multilevel Semantic-Aware Model for AI-Generated Video Quality Assessment Jan 6, 2025 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 00 Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks Apr 22, 2022 Question Answering Visual Commonsense Reasoning
— Unverified 00 Multimodal Commonsense Knowledge Distillation for Visual Question Answering Nov 5, 2024 Knowledge Distillation Question Answering
— Unverified 00 Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation Mar 23, 2017 Decoder Machine Translation
— Unverified 00 Multimodal Continuous Visual Attention Mechanisms Apr 7, 2021 Clustering Question Answering
— Unverified 00 Multi-modal Deep Analysis for Multimedia Oct 11, 2019 Multi-modal Recommendation Question Answering
— Unverified 00 Multimodal Differential Network for Visual Question Generation Oct 1, 2018 Image Captioning Natural Questions
— Unverified 00 Multimodal Few-Shot Learning with Frozen Language Models Jun 25, 2021 Few-Shot Learning Language Modeling
— Unverified 00 Multi-Modal Fusion Transformer for Visual Question Answering in Remote Sensing Oct 10, 2022 Question Answering Representation Learning
— Unverified 00 Multimodal Graph Networks for Compositional Generalization in Visual Question Answering Dec 1, 2020 Graph Neural Network Question Answering
— Unverified 00 Multimodal grid features and cell pointers for Scene Text Visual Question Answering Jun 1, 2020 Question Answering Visual Question Answering
— Unverified 00 Multi-Modal Hallucination Control by Visual Information Grounding Mar 20, 2024 Hallucination Visual Question Answering (VQA)
— Unverified 00 Multi-Modal Instruction-Tuning Small-Scale Language-and-Vision Assistant for Semiconductor Electron Micrograph Analysis Aug 27, 2024 Instruction Following Question Answering
— Unverified 00 Multimodal Integration of Human-Like Attention in Visual Question Answering Sep 27, 2021 Question Answering Visual Question Answering
— Unverified 00 Multimodal Intelligence: Representation Learning, Information Fusion, and Applications Nov 10, 2019 Caption Generation Image Generation
— Unverified 00 Multi-modality Latent Interaction Network for Visual Question Answering Aug 10, 2019 Language Modeling Language Modelling
— Unverified 00 Multimodal Learning and Reasoning for Visual Question Answering Dec 1, 2017 Question Answering Representation Learning
— Unverified 00 Scene Graph Reasoning with Prior Visual Relationship for Visual Question Answering Dec 23, 2018 Cross-Modal Information Retrieval Information Retrieval
— Unverified 00 Multimodal Machine Learning: Integrating Language, Vision and Speech Jul 1, 2017 Audio-Visual Speech Recognition BIG-bench Machine Learning
— Unverified 00 Multimodal Neural Graph Memory Networks for Visual Question Answering Jul 1, 2020 Graph Neural Network Question Answering
— Unverified 00 Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data Jul 1, 2018 Image Description Machine Translation
— Unverified 00 Multimodal Reranking for Knowledge-Intensive Visual Question Answering Jul 17, 2024 Answer Generation Question Answering
— Unverified 00 Multi-Modal Retrieval Augmentation for Open-Ended and Knowledge-Intensive Video Question Answering Feb 17, 2025 Multiple-choice Question Answering
— Unverified 00 Multimodal Unified Attention Networks for Vision-and-Language Interactions Aug 12, 2019 Question Answering Visual Grounding
— Unverified 00 Multiple-Question Multiple-Answer Text-VQA Nov 15, 2023 Decoder Denoising
— Unverified 00 Multi-Prompts Learning with Cross-Modal Alignment for Attribute-based Person Re-Identification Dec 28, 2023 Attribute cross-modal alignment
— Unverified 00 Multi-task Learning of Hierarchical Vision-Language Representation Dec 3, 2018 Multi-Task Learning Question Answering
— Unverified 00 MUST-VQA: MUltilingual Scene-text VQA Sep 14, 2022 Question Answering Visual Question Answering
— Unverified 00 MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering Jul 7, 2021 Medical Visual Question Answering Missing Labels
— Unverified 00 NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Oct 18, 2024 Attribute Question Answering
— Unverified 00 Natural Language Understanding and Inference with MLLM in Visual Question Answering: A Survey Nov 26, 2024 Natural Language Understanding Question Answering
— Unverified 00