How Well Can Vison-Language Models Understand Humans' Intention? An Open-ended Theory of Mind Question Evaluation Benchmark Mar 28, 2025 Question Answering Visual Question Answering
— Unverified 0CP-LLM: Context and Pixel Aware Large Language Model for Video Quality Assessment May 21, 2025 Language Modeling Language Modelling
— Unverified 0Connecting phases of matter to the flatness of the loss landscape in analog variational quantum algorithms Jun 16, 2025 Visual Question Answering (VQA)
— Unverified 0CQ-VQA: Visual Question Answering on Categorized Questions Feb 17, 2020 Question Answering Visual Question Answering
— Unverified 0Connecting Language and Vision to Actions Jul 1, 2018 Image Captioning Language Modeling
— Unverified 0Guiding Medical Vision-Language Models with Explicit Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations Jan 4, 2025 Decoder Visual Question Answering (VQA)
— Unverified 0A Transformer-based Cross-modal Fusion Model with Adversarial Training for VQA Challenge 2021 Jun 24, 2021 Visual Question Answering (VQA)
— Unverified 0Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? Jun 11, 2016 Question Answering Visual Question Answering
— Unverified 0Hummingbird: High Fidelity Image Generation via Multimodal Context Alignment Feb 7, 2025 Diversity Human-Object Interaction Detection
— Unverified 0HVS Revisited: A Comprehensive Video Quality Assessment Framework Oct 9, 2022 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Grounding Complex Navigational Instructions Using Scene Graphs Jun 3, 2021 Question Answering reinforcement-learning
— Unverified 0Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports May 22, 2025 Answer Generation Question Answering
— Unverified 0Grounding Answers for Visual Questions Asked by Visually Impaired People Jun 20, 2022 Question Answering Visual Question Answering
— Unverified 0A Token-level Text Image Foundation Model for Document Understanding Mar 4, 2025 document understanding Visual Question Answering (VQA)
— Unverified 0Large Scale Scene Text Verification with Guided Attention Apr 23, 2018 Question Answering Scene Text Detection
— Unverified 0LEAF-QA: Locate, Encode & Attend for Figure Question Answering Jul 30, 2019 Chart Question Answering Question Answering
— Unverified 0Compressing Visual-linguistic Model via Knowledge Distillation Apr 5, 2021 Image Captioning Knowledge Distillation
— Unverified 0ICDAR 2021 Competition on Document VisualQuestion Answering Nov 10, 2021 Visual Question Answering (VQA)
— Unverified 0Grounded Word Sense Translation Jun 1, 2019 Grounded language learning Machine Translation
— Unverified 0LAPDoc: Layout-Aware Prompting for Documents Feb 15, 2024 document understanding Key Information Extraction
— Unverified 0A Dataset for Multimodal Question Answering in the Cultural Heritage Domain Dec 1, 2016 Question Answering Speech Recognition
— Unverified 0Neural Reasoning, Fast and Slow, for Video Question Answering Jul 10, 2019 Natural Questions Question Answering
— Unverified 0Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models Oct 21, 2024 Instruction Following object-detection
— Unverified 0A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answering Jan 14, 2022 Generative Question Answering Image to text
— Unverified 0Graph-Structured Representations for Visual Question Answering Sep 19, 2016 Multiple-choice Question Answering
— Unverified 0CLIPPO: Image-and-Language Understanding from Pixels Only Dec 15, 2022 Contrastive Learning image-classification
— Unverified 0Compound Tokens: Channel Fusion for Vision-Language Representation Learning Dec 2, 2022 Decoder Language Modeling
— Unverified 0Image Captioning and Visual Question Answering Based on Attributes and External Knowledge Mar 9, 2016 General Knowledge Image Captioning
— Unverified 0Image Captioning with Compositional Neural Module Networks Jul 10, 2020 Image Captioning Question Answering
— Unverified 0Image Manipulation via Multi-Hop Instructions -- A New Dataset and Weakly-Supervised Neuro-Symbolic Approach May 23, 2023 Image Manipulation Question Answering
— Unverified 0Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture Nov 11, 2021 Graph Attention Question Answering
— Unverified 0Bilinear Graph Networks for Visual Question Answering Jul 23, 2019 Question Answering Visual Question Answering
— Unverified 0Aligning MAGMA by Few-Shot Learning and Finetuning Oct 18, 2022 Few-Shot Learning Image Captioning
— Unverified 0ImageTTR: Grounding Type Theory with Records in Image Classification for Visual Question Answering Jun 1, 2019 General Classification image-classification
— Unverified 0Graph Neural Networks in Vision-Language Image Understanding: A Survey Mar 7, 2023 Image Captioning Image Retrieval
— Unverified 0CrossVQA: Scalably Generating Benchmarks for Systematically Testing VQA Generalization Nov 1, 2021 Answer Generation Question-Answer-Generation
— Unverified 0Compositional Memory for Visual Question Answering Nov 18, 2015 Question Answering Visual Question Answering
— Unverified 0Improved Bilinear Pooling with CNNs Jul 21, 2017 GPU Question Answering
— Unverified 0Graph Edit Distance Reward: Learning to Edit Scene Graph Aug 15, 2020 Graph Matching Image Retrieval
— Unverified 0Improved Few-Shot Image Classification Through Multiple-Choice Questions Jul 23, 2024 Articles Few-Shot Image Classification
— Unverified 0A survey on VQA_Datasets and Approaches May 2, 2021 Question Answering Survey
— Unverified 0Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection Dec 13, 2021 Common Sense Reasoning Knowledge Graph Embeddings
— Unverified 0Improving Automatic VQA Evaluation Using Large Language Models Oct 4, 2023 In-Context Learning Question Answering
— Unverified 0Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning Apr 15, 2022 Contrastive Learning Question Answering
— Unverified 0Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning Jan 28, 2024 Data Augmentation Question Answering
— Unverified 0Improving Generalization in Visual Reasoning via Self-Ensemble Oct 28, 2024 Visual Question Answering (VQA) Visual Reasoning
— Unverified 0A survey on knowledge-enhanced multimodal learning Nov 19, 2022 Conditional Image Generation Factual Visual Question Answering
— Unverified 0Improving mitosis detection on histopathology images using large vision-language models Oct 11, 2023 Domain Generalization Image Captioning
— Unverified 0Graph-based Heuristic Search for Module Selection Procedure in Neural Module Network Sep 30, 2020 Heuristic Search Question Answering
— Unverified 0GRAM: Global Reasoning for Multi-Page VQA Jan 7, 2024 Question Answering Visual Question Answering
— Unverified 0