RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering Oct 19, 2023 Image Captioning Question Answering
Code Code Available 0UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models Oct 17, 2023 Attribute Question Answering
Code Code Available 0Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA Oct 13, 2023 Graph Learning Object
— Unverified 0Open-Set Knowledge-Based Visual Question Answering with Inference Paths Oct 12, 2023 Knowledge Graphs Multi-class Classification
Code Code Available 0Jaeger: A Concatenation-Based Multi-Transformer VQA Model Oct 11, 2023 Dimensionality Reduction model
— Unverified 0Off-Policy Evaluation for Human Feedback Oct 11, 2023 Off-policy evaluation Reinforcement Learning (RL)
— Unverified 0Improving mitosis detection on histopathology images using large vision-language models Oct 11, 2023 Domain Generalization Image Captioning
— Unverified 0How (not) to ensemble LVLMs for VQA Oct 10, 2023 Retrieval Visual Question Answering (VQA)
— Unverified 0Causal Reasoning through Two Layers of Cognition for Improving Generalization in Visual Question Answering Oct 9, 2023 Answer Generation Question Answering
— Unverified 0Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models Oct 9, 2023 Hallucination Object
— Unverified 0Improving Automatic VQA Evaluation Using Large Language Models Oct 4, 2023 In-Context Learning Question Answering
— Unverified 0On the Cognition of Visual Question Answering Models and Human Intelligence: A Comparative Study Oct 4, 2023 Question Answering Visual Question Answering
— Unverified 0SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering Oct 3, 2023 Graph Neural Network Question Answering
— Unverified 0Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of Text-To-Image Models Oct 3, 2023 Image Generation Visual Question Answering (VQA)
Code Code Available 0ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens Sep 28, 2023 Cross-Modal Retrieval GPU
Code Code Available 0Tackling VQA with Pretrained Foundation Models without Further Training Sep 27, 2023 Question Answering Visual Question Answering
— Unverified 0InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition Sep 26, 2023 Articles Image Comprehension
Code Code Available 0Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis Sep 21, 2023 Cross-Modal Retrieval Image Captioning
Code Code Available 0Sentence Attention Blocks for Answer Grounding Sep 20, 2023 Question Answering Sentence
— Unverified 0Visual Question Answering in the Medical Domain Sep 20, 2023 Contrastive Learning Medical Visual Question Answering
— Unverified 0Syntax Tree Constrained Graph Network for Visual Question Answering Sep 17, 2023 Question Answering Visual Question Answering
— Unverified 0D3: Data Diversity Design for Systematic Generalization in Visual Question Answering Sep 15, 2023 Diversity Question Answering
Code Code Available 0Interpretable Visual Question Answering via Reasoning Supervision Sep 7, 2023 Common Sense Reasoning Question Answering
— Unverified 0S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning Sep 5, 2023 Decision Making Visual Question Answering (VQA)
— Unverified 0Distraction-free Embeddings for Robust VQA Aug 31, 2023 Question Answering Video Question Answering
— Unverified 0Separate and Locate: Rethink the Text in Text-based Visual Question Answering Aug 31, 2023 Optical Character Recognition (OCR) Position
Code Code Available 0VQA Therapy: Exploring Answer Differences by Visually Grounding Answers Aug 21, 2023 Question Answering Visual Question Answering
Code Code Available 0UGC Quality Assessment: Exploring the Impact of Saliency in Deep Feature-Based Quality Assessment Aug 13, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment Aug 1, 2023 Diversity Knowledge Distillation
— Unverified 0Making the V in Text-VQA Matter Aug 1, 2023 Optical Character Recognition (OCR) TextVQA
— Unverified 0Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks Jul 31, 2023 Image Retrieval Object
— Unverified 0Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment Jul 31, 2023 Action Recognition Blocking
— Unverified 0Workshop on Document Intelligence Understanding Jul 31, 2023 document understanding Visual Question Answering (VQA)
— Unverified 0Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering Jul 28, 2023 Question Answering Visual Question Answering
Code Code Available 0BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering Jul 28, 2023 Question Answering Vietnamese Visual Question Answering
— Unverified 0LOIS: Looking Out of Instance Semantics for Visual Question Answering Jul 26, 2023 Question Answering Visual Question Answering
— Unverified 0Robust Visual Question Answering: Datasets, Methods, and Future Challenges Jul 21, 2023 Question Answering Visual Question Answering
— Unverified 0NTIRE 2023 Quality Assessment of Video Enhancement Challenge Jul 19, 2023 Deblurring Image Restoration
— Unverified 0A reinforcement learning approach for VQA validation: an application to diabetic macular edema grading Jul 19, 2023 Medical Image Analysis Question Answering
— Unverified 0Towards a performance analysis on pre-trained Visual Question Answering models for autonomous driving Jul 18, 2023 Autonomous Driving Model Selection
Code Code Available 0Generative Visual Question Answering Jul 18, 2023 Generative Visual Question Answering Question Answering
— Unverified 0Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation Jul 18, 2023 Image Generation Question Answering
— Unverified 0Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback Jul 10, 2023 Image Generation Visual Question Answering (VQA)
— Unverified 0Subjective and Objective Audio-Visual Quality Assessment for User Generated Content Jul 10, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image Enhancement for Gastrointestinal Visual Question Answering Jul 6, 2023 Diagnostic Image Enhancement
— Unverified 0DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment Jul 1, 2023 Language Modeling Language Modelling
— Unverified 0Lightweight Recurrent Cross-modal Encoder for Video Question Answering Jun 30, 2023 Action Recognition Question Answering
Code Code Available 0Deep Equilibrium Multimodal Fusion Jun 29, 2023 Visual Question Answering (VQA)
— Unverified 0Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering Jun 28, 2023 Passage Retrieval Question Answering
Code Code Available 0Visual Question Answering in Remote Sensing with Cross-Attention and Multimodal Information Bottleneck Jun 25, 2023 object-detection Object Detection
— Unverified 0