DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document Images Jun 26, 2025 document understanding Optical Character Recognition (OCR)
Code Code Available 0TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines Oct 31, 2019 Attribute Question Answering
Code Code Available 0Lightweight Recurrent Cross-modal Encoder for Video Question Answering Jun 30, 2023 Action Recognition Question Answering
Code Code Available 0Learning Visual Question Answering by Bootstrapping Hard Attention Aug 1, 2018 Hard Attention Question Answering
Code Code Available 0TAB-VCR: Tags and Attributes based VCR Baselines Dec 1, 2019 Attribute Question Answering
Code Code Available 0Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos Jun 11, 2025 Question Answering Visual Question Answering
Code Code Available 0Learning to Reason: End-to-End Module Networks for Visual Question Answering Apr 18, 2017 Visual Dialog Visual Question Answering
Code Code Available 0Adversarial Training with OCR Modality Perturbation for Scene-Text Visual Question Answering Mar 14, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 0Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering Dec 1, 2017 Question Answering Visual Question Answering
Code Code Available 0VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction Mar 25, 2025 Generative Visual Question Answering Question Answering
Code Code Available 0Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances Sep 18, 2022 Attribute Question Answering
Code Code Available 0Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles Nov 7, 2020 Natural Language Inference Question Answering
Code Code Available 0Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs Apr 11, 2024 Descriptive Hallucination
Code Code Available 0VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning Jun 20, 2024 Image Comprehension Question Answering
Code Code Available 0Learning to Count Objects in Natural Images for Visual Question Answering Feb 15, 2018 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 0Are VLMs Really Blind Oct 29, 2024 Language Modeling Language Modelling
Code Code Available 0Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning Oct 4, 2022 Image Captioning Sentence
Code Code Available 0Learning Representations of Sets through Optimized Permutations Dec 10, 2018 General Classification Question Answering
Code Code Available 0TallyQA: Answering Complex Counting Questions Oct 29, 2018 Attribute Object Counting
Code Code Available 0Learning from Lexical Perturbations for Consistent Visual Question Answering Nov 26, 2020 Question Answering Visual Question Answering
Code Code Available 0Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents Nov 23, 2024 Question Answering RAG
Code Code Available 0Learning Convolutional Text Representations for Visual Question Answering May 18, 2017 General Classification image-classification
Code Code Available 0Learning content and context with language bias for Visual Question Answering Dec 21, 2020 Question Answering Visual Question Answering
Code Code Available 0Learning Conditioned Graph Structures for Interpretable Visual Question Answering Jun 19, 2018 Question Answering Visual Question Answering
Code Code Available 0DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness Nov 29, 2024 Optical Character Recognition (OCR) Question Answering
Code Code Available 0P NP, at least in Visual Question Answering Mar 26, 2020 Question Answering Visual Question Answering
Code Code Available 0Visual Reasoning with Multi-hop Feature Modulation Aug 3, 2018 Question Answering Visual Dialog
Code Code Available 0Learning by Abstraction: The Neural State Machine Jul 9, 2019 Visual Question Answering (VQA) Visual Reasoning
Code Code Available 0VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering Dec 12, 2016 Question Answering Visual Question Answering
Code Code Available 0LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking Apr 18, 2022 cross-modal alignment Document AI
Code Code Available 0LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding Dec 29, 2020 Document Image Classification Document Layout Analysis
Code Code Available 0Patent Figure Classification using Large Vision-language Models Jan 22, 2025 Classification Few-Shot Learning
Code Code Available 0Task Formulation Matters When Learning Continually: A Case Study in Visual Question Answering Sep 30, 2022 Continual Learning Question Answering
Code Code Available 0LAWS: Look Around and Warm-Start Natural Gradient Descent for Quantum Neural Networks May 5, 2022 Combinatorial Optimization Visual Question Answering (VQA)
Code Code Available 0Visuo-Linguistic Question Answering (VLQA) Challenge May 1, 2020 Question Answering Reading Comprehension
Code Code Available 0Latent Alignment and Variational Attention Jul 10, 2018 Hard Attention Machine Translation
Code Code Available 0ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese Oct 27, 2023 Information Retrieval Natural Language Queries
Code Code Available 0ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering Oct 18, 2024 Question Answering Visual Question Answering
Code Code Available 0Large Models in Dialogue for Active Perception and Anomaly Detection Jan 27, 2025 Anomaly Detection Question Answering
Code Code Available 0Large Language Models Understand Layout Jul 8, 2024 Question Answering Visual Question Answering
Code Code Available 0Cascaded Mutual Modulation for Visual Reasoning Sep 6, 2018 Question Answering Visual Question Answering
Code Code Available 0Language-Conditioned Graph Networks for Relational Reasoning May 10, 2019 Object Referring Expression Comprehension
Code Code Available 0CARETS: A Consistency And Robustness Evaluative Test Suite for VQA Mar 15, 2022 Negation Question Generation
Code Code Available 0Perceptual Score: What Data Modalities Does Your Model Perceive? Oct 27, 2021 Question Answering Visual Dialog
Code Code Available 0TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs May 16, 2025 Benchmarking Question Answering
Code Code Available 0Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy Jun 11, 2025 Medical Visual Question Answering Question Answering
Code Code Available 0Kvasir-VQA: A Text-Image Pair GI Tract Dataset Sep 2, 2024 Image Captioning Image Generation
Code Code Available 0Bridging Languages through Images with Deep Partial Canonical Correlation Analysis Jul 1, 2018 Image Description Image Retrieval
Code Code Available 0DisCoVQA: Temporal Distortion-Content Transformers for Video Quality Assessment Jun 20, 2022 Time Series Analysis Video Quality Assessment
Code Code Available 0KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language Mar 31, 2025 Form Question Answering
Code Code Available 0