Interpretable Visual Question Answering via Reasoning Supervision Sep 7, 2023 Common Sense Reasoning Question Answering
— Unverified 0S3C: Semi-Supervised VQA Natural Language Explanation via Self-Critical Learning Sep 5, 2023 Decision Making Visual Question Answering (VQA)
— Unverified 0Can I Trust Your Answer? Visually Grounded Video Question Answering Sep 4, 2023 Grounded Video Question Answering Question Answering
Code Code Available 1Distraction-free Embeddings for Robust VQA Aug 31, 2023 Question Answering Video Question Answering
— Unverified 0Separate and Locate: Rethink the Text in Text-based Visual Question Answering Aug 31, 2023 Optical Character Recognition (OCR) Position
Code Code Available 0Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond Aug 24, 2023 Chart Question Answering FS-MEVQA
Code Code Available 5VQA Therapy: Exploring Answer Differences by Visually Grounding Answers Aug 21, 2023 Question Answering Visual Question Answering
Code Code Available 0BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions Aug 19, 2023 MME Optical Character Recognition (OCR)
Code Code Available 2VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control Aug 18, 2023 Image Captioning Text Generation
Code Code Available 1Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models Aug 18, 2023 Multiple-choice Question Answering
Code Code Available 1Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language Tasks Aug 17, 2023 Question Answering Text Generation
Code Code Available 1Pro-Cap: Leveraging a Frozen Vision-Language Model for Hateful Meme Detection Aug 16, 2023 Image Captioning Language Modeling
Code Code Available 1TeCH: Text-guided Reconstruction of Lifelike Clothed Humans Aug 16, 2023 Descriptive Question Answering
Code Code Available 2UGC Quality Assessment: Exploring the Impact of Saliency in Deep Feature-Based Quality Assessment Aug 13, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Detecting and Preventing Hallucinations in Large Vision Language Models Aug 11, 2023 16k Hallucination
Code Code Available 1StableVQA: A Deep No-Reference Quality Assessment Model for Video Stability Aug 9, 2023 Optical Flow Estimation Video Quality Assessment
Code Code Available 1SciGraphQA: A Large-Scale Synthetic Multi-Turn Question-Answering Dataset for Scientific Graphs Aug 7, 2023 Question Answering Visual Question Answering
Code Code Available 1OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models Aug 2, 2023 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 4Ada-DQA: Adaptive Diverse Quality-aware Feature Acquisition for Video Quality Assessment Aug 1, 2023 Diversity Knowledge Distillation
— Unverified 0Making the V in Text-VQA Matter Aug 1, 2023 Optical Character Recognition (OCR) TextVQA
— Unverified 0Workshop on Document Intelligence Understanding Jul 31, 2023 document understanding Visual Question Answering (VQA)
— Unverified 0Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment Jul 31, 2023 Action Recognition Blocking
— Unverified 0Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks Jul 31, 2023 Image Retrieval Object
— Unverified 0Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering Jul 28, 2023 Question Answering Visual Question Answering
Code Code Available 0BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering Jul 28, 2023 Question Answering Vietnamese Visual Question Answering
— Unverified 0Med-Flamingo: a Multimodal Medical Few-shot Learner Jul 27, 2023 Medical Visual Question Answering Question Answering
Code Code Available 2Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models Jul 26, 2023 Image Quality Assessment No-Reference Image Quality Assessment
Code Code Available 1LOIS: Looking Out of Instance Semantics for Visual Question Answering Jul 26, 2023 Question Answering Visual Question Answering
— Unverified 0Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering Jul 22, 2023 Graph Representation Learning Language Modeling
Code Code Available 1Robust Visual Question Answering: Datasets, Methods, and Future Challenges Jul 21, 2023 Question Answering Visual Question Answering
— Unverified 0NTIRE 2023 Quality Assessment of Video Enhancement Challenge Jul 19, 2023 Deblurring Image Restoration
— Unverified 0A reinforcement learning approach for VQA validation: an application to diabetic macular edema grading Jul 19, 2023 Medical Image Analysis Question Answering
— Unverified 0Explaining Autonomous Driving Actions with Visual Question Answering Jul 19, 2023 Autonomous Driving Autonomous Vehicles
Code Code Available 1Generative Visual Question Answering Jul 18, 2023 Generative Visual Question Answering Question Answering
— Unverified 0Towards a performance analysis on pre-trained Visual Question Answering models for autonomous driving Jul 18, 2023 Autonomous Driving Model Selection
Code Code Available 0Let's ViCE! Mimicking Human Cognitive Behavior in Image Generation Evaluation Jul 18, 2023 Image Generation Question Answering
— Unverified 0Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting Jul 11, 2023 Medical Visual Question Answering Question Answering
Code Code Available 1Emu: Generative Pretraining in Multimodality Jul 11, 2023 Image Captioning Image Generation
Code Code Available 3CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery Jul 11, 2023 Question Answering Scene Understanding
Code Code Available 1Subjective and Objective Audio-Visual Quality Assessment for User Generated Content Jul 10, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback Jul 10, 2023 Image Generation Visual Question Answering (VQA)
— Unverified 0Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models Jul 9, 2023 Question Answering TGIF-Frame
Code Code Available 1GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest Jul 7, 2023 Attribute Common Sense Reasoning
Code Code Available 2UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image Enhancement for Gastrointestinal Visual Question Answering Jul 6, 2023 Diagnostic Image Enhancement
— Unverified 0Localized Questions in Medical Visual Question Answering Jul 3, 2023 Medical Visual Question Answering Question Answering
Code Code Available 1UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding Jul 3, 2023 Image-text matching Sentence
Code Code Available 1DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment Jul 1, 2023 Language Modeling Language Modelling
— Unverified 0Lightweight Recurrent Cross-modal Encoder for Video Question Answering Jun 30, 2023 Action Recognition Question Answering
Code Code Available 0Multimodal Prompt Retrieval for Generative Visual Question Answering Jun 30, 2023 Domain Adaptation Generative Visual Question Answering
Code Code Available 1CausalVLR: A Toolbox and Benchmark for Visual-Linguistic Causal Reasoning Jun 30, 2023 Causal Inference Medical Report Generation
Code Code Available 3