Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering Jan 1, 2025 Contrastive Learning Medical Visual Question Answering
— Unverified 0KNVQA: A Benchmark for evaluation knowledge-based VQA Nov 21, 2023 Hallucination Object Hallucination
— Unverified 0KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain Knowledge-Based VQA Dec 20, 2020 Visual Question Answering (VQA)
— Unverified 0Knowledge-Based Visual Question Answering in Videos Apr 17, 2020 Question Answering Video Question Answering
— Unverified 0HD-EPIC: A Highly-Detailed Egocentric Video Dataset Feb 6, 2025 Action Recognition Nutrition
— Unverified 0Knowledge Condensation and Reasoning for Knowledge-based VQA Mar 15, 2024 Question Answering Reading Comprehension
— Unverified 0Attention Mechanism based Cognition-level Scene Understanding Apr 17, 2022 Question Answering Scene Understanding
— Unverified 0Hardware-Friendly Static Quantization Method for Video Diffusion Transformers Feb 20, 2025 Quantization Video Generation
— Unverified 0HAMMR: HierArchical MultiModal React agents for generic VQA Apr 8, 2024 Optical Character Recognition (OCR) Question Answering
— Unverified 0Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering Jul 31, 2024 Diagnostic Hallucination
— Unverified 0Knowledge Detection by Relevant Question and Image Attributes in Visual Question Answering Jun 8, 2023 Question Answering Retrieval
— Unverified 0KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual Commonsense Reasoning Dec 13, 2020 Sentence Visual Commonsense Reasoning
— Unverified 0Language bias in Visual Question Answering: A Survey and Taxonomy Nov 16, 2021 Question Answering Visual Question Answering
— Unverified 0LAPDoc: Layout-Aware Prompting for Documents Feb 15, 2024 document understanding Key Information Extraction
— Unverified 0Learning by Hallucinating: Vision-Language Pre-training with Weak Supervision Oct 24, 2022 cross-modal alignment Cross-Modal Retrieval
— Unverified 0Attention Guided Semantic Relationship Parsing for Visual Question Answering Oct 5, 2020 Object Question Answering
— Unverified 0`Just because you are right, doesn't mean I am wrong': Overcoming a bottleneck in development and evaluation of Open-Ended VQA tasks Apr 1, 2021 Question Answering Visual Question Answering
— Unverified 0KAT: A Knowledge Augmented Transformer for Vision-and-Language Jan 16, 2022 Answer Generation Decoder
— Unverified 0Jointly Learning Truth-Conditional Denotations and Groundings using Parallel Attention Apr 14, 2021 Question Answering Visual Question Answering
— Unverified 0HAUR: Human Annotation Understanding and Recognition Through Text-Heavy Images Dec 24, 2024 Optical Character Recognition (OCR) Question Answering
— Unverified 0JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems Jan 1, 2025 Question Answering Visual Question Answering
— Unverified 0Kernel Pooling for Convolutional Neural Networks Jul 1, 2017 Face Recognition Fine-Grained Visual Categorization
— Unverified 0Guiding Visual Question Generation Oct 15, 2021 Question Generation Question-Generation
— Unverified 0HDR-ChipQA: No-Reference Quality Assessment on High Dynamic Range Videos Apr 25, 2023 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0AlignVE: Visual Entailment Recognition Based on Alignment Relations Nov 16, 2022 Question Answering Relation
— Unverified 0Guiding Visual Question Answering with Attention Priors May 25, 2022 Question Answering Visual Grounding
— Unverified 0Connecting phases of matter to the flatness of the loss landscape in analog variational quantum algorithms Jun 16, 2025 Visual Question Answering (VQA)
— Unverified 0Connecting Language and Vision to Actions Jul 1, 2018 Image Captioning Language Modeling
— Unverified 0Guiding Medical Vision-Language Models with Explicit Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations Jan 4, 2025 Decoder Visual Question Answering (VQA)
— Unverified 0Hierarchical Memory for Long Video QA Jun 30, 2024 GPU Question Answering
— Unverified 0Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion Apr 4, 2025 Diagnostic Medical Visual Question Answering
— Unverified 0A Transformer-based Cross-modal Fusion Model with Adversarial Training for VQA Challenge 2021 Jun 24, 2021 Visual Question Answering (VQA)
— Unverified 0Joint Image Captioning and Question Answering May 22, 2018 Image Captioning Question Answering
— Unverified 0Grounding Complex Navigational Instructions Using Scene Graphs Jun 3, 2021 Question Answering reinforcement-learning
— Unverified 0Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports May 22, 2025 Answer Generation Question Answering
— Unverified 0Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training Strategy Jul 30, 2024 4k Video Quality Assessment
— Unverified 0Grounding Answers for Visual Questions Asked by Visually Impaired People Jun 20, 2022 Question Answering Visual Question Answering
— Unverified 0HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training Dec 30, 2022 cross-modal alignment TGIF-Action
— Unverified 0A Token-level Text Image Foundation Model for Document Understanding Mar 4, 2025 document understanding Visual Question Answering (VQA)
— Unverified 0How good are deep models in understanding the generated images? Aug 23, 2022 Object Object Recognition
— Unverified 0Joint learning of object graph and relation graph for visual question answering May 9, 2022 Attribute Graph Neural Network
— Unverified 0Generating and Evaluating Explanations of Attended and Error-Inducing Input Regions for VQA Models Mar 26, 2021 Question Answering Visual Question Answering
— Unverified 0Compressing Visual-linguistic Model via Knowledge Distillation Apr 5, 2021 Image Captioning Knowledge Distillation
— Unverified 0Grounded Word Sense Translation Jun 1, 2019 Grounded language learning Machine Translation
— Unverified 0It Takes Two to Tango: Towards Theory of AI's Mind Apr 3, 2017 Attribute Question Answering
— Unverified 0Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models Oct 21, 2024 Instruction Following object-detection
— Unverified 0How to Design Sample and Computationally Efficient VQA Models Mar 22, 2021 Question Answering Visual Question Answering
— Unverified 0Co-VQA : Answering by Interactive Sub Question Sequence Nov 16, 2021 Question Answering Visual Question Answering
— Unverified 0A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering Jan 14, 2022 Generative Question Answering Image to text
— Unverified 0iVQA: Inverse Visual Question Answering Oct 10, 2017 Question Answering Question Generation
— Unverified 0