GPT-4V-AD: Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection Nov 5, 2023 Anomaly Detection Question Answering
Code Code Available 1From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities Nov 1, 2023 Navigate Question Answering
— Unverified 0VQA-GEN: A Visual Question Answering Benchmark for Domain Generalization Nov 1, 2023 Domain Generalization Question Answering
— Unverified 0A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis Oct 31, 2023 Descriptive Medical Image Analysis
— Unverified 0Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts Oct 31, 2023 Image Captioning Language Modeling
Code Code Available 1Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V Oct 29, 2023 Diagnostic Language Modeling
Code Code Available 1Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery Oct 29, 2023 Deep Learning Multimodal Deep Learning
Code Code Available 0EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images Oct 28, 2023 Decision Making Medical Visual Question Answering
Code Code Available 13D-Aware Visual Question Answering about Parts, Poses and Occlusions Oct 27, 2023 Question Answering Visual Question Answering
Code Code Available 1ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese Oct 27, 2023 Information Retrieval Natural Language Queries
Code Code Available 0Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation Oct 27, 2023 Image Generation Question Answering
— Unverified 0Incorporating Probing Signals into Multimodal Machine Translation via Visual Question-Answering Pairs Oct 26, 2023 Attribute Machine Translation
Code Code Available 0Exploring Question Decomposition for Zero-Shot VQA Oct 25, 2023 Question Answering Visual Question Answering
— Unverified 0Geometry-Aware Video Quality Assessment for Dynamic Digital Human Oct 24, 2023 Attribute Video Quality Assessment
— Unverified 0Towards Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal LLMs Oct 24, 2023 Question Answering Visual Question Answering
Code Code Available 1Large Language Models are Temporal and Causal Reasoners for Video Question Answering Oct 24, 2023 Natural Language Understanding Question Answering
Code Code Available 1LXMERT Model Compression for Visual Question Answering Oct 23, 2023 model Model Compression
Code Code Available 0HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models Oct 23, 2023 Diagnostic Hallucination
Code Code Available 2A Simple Baseline for Knowledge-Based Visual Question Answering Oct 20, 2023 In-Context Learning Question Answering
Code Code Available 0RSAdapter: Adapting Multimodal Models for Remote Sensing Visual Question Answering Oct 19, 2023 Image Captioning Question Answering
Code Code Available 0UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models Oct 17, 2023 Attribute Question Answering
Code Code Available 0VLIS: Unimodal Language Models Guide Multimodal Language Generation Oct 15, 2023 Caption Generation Explanation Generation
Code Code Available 1PaLI-3 Vision Language Models: Smaller, Faster, Stronger Oct 13, 2023 Chart Question Answering Cross-Modal Retrieval
Code Code Available 1Exploring Sparse Spatial Relation in Graph Inference for Text-Based VQA Oct 13, 2023 Graph Learning Object
— Unverified 0Open-Set Knowledge-Based Visual Question Answering with Inference Paths Oct 12, 2023 Knowledge Graphs Multi-class Classification
Code Code Available 0Improving mitosis detection on histopathology images using large vision-language models Oct 11, 2023 Domain Generalization Image Captioning
— Unverified 0Jaeger: A Concatenation-Based Multi-Transformer VQA Model Oct 11, 2023 Dimensionality Reduction model
— Unverified 0Off-Policy Evaluation for Human Feedback Oct 11, 2023 Off-policy evaluation Reinforcement Learning (RL)
— Unverified 0How (not) to ensemble LVLMs for VQA Oct 10, 2023 Retrieval Visual Question Answering (VQA)
— Unverified 0What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models Oct 10, 2023 Benchmarking Code Generation
Code Code Available 1Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models Oct 9, 2023 Language Modelling Question Answering
Code Code Available 1Causal Reasoning through Two Layers of Cognition for Improving Generalization in Visual Question Answering Oct 9, 2023 Answer Generation Question Answering
— Unverified 0Negative Object Presence Evaluation (NOPE) to Measure Object Hallucination in Vision-Language Models Oct 9, 2023 Hallucination Object
— Unverified 0Improved Baselines with Visual Instruction Tuning Oct 5, 2023 Factual Inconsistency Detection in Chart Captioning Image Classification
Code Code Available 6Improving Automatic VQA Evaluation Using Large Language Models Oct 4, 2023 In-Context Learning Question Answering
— Unverified 0On the Cognition of Visual Question Answering Models and Human Intelligence: A Comparative Study Oct 4, 2023 Question Answering Visual Question Answering
— Unverified 0HallE-Control: Controlling Object Hallucination in Large Multimodal Models Oct 3, 2023 Attribute Decoder
Code Code Available 1SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based Question Answering Oct 3, 2023 Graph Neural Network Question Answering
— Unverified 0Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of Text-To-Image Models Oct 3, 2023 Image Generation Visual Question Answering (VQA)
Code Code Available 0Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning Oct 1, 2023 In-Context Learning Instruction Following
Code Code Available 1Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering Sep 29, 2023 Image to text Passage Retrieval
Code Code Available 2ELIP: Efficient Language-Image Pre-training with Fewer Vision Tokens Sep 28, 2023 Cross-Modal Retrieval GPU
Code Code Available 0Tackling VQA with Pretrained Foundation Models without Further Training Sep 27, 2023 Question Answering Visual Question Answering
— Unverified 0InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition Sep 26, 2023 Articles Image Comprehension
Code Code Available 0Vulnerabilities in Video Quality Assessment Models: The Challenge of Adversarial Attacks Sep 24, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis Sep 21, 2023 Cross-Modal Retrieval Image Captioning
Code Code Available 0Sentence Attention Blocks for Answer Grounding Sep 20, 2023 Question Answering Sentence
— Unverified 0Visual Question Answering in the Medical Domain Sep 20, 2023 Contrastive Learning Medical Visual Question Answering
— Unverified 0Syntax Tree Constrained Graph Network for Visual Question Answering Sep 17, 2023 Question Answering Visual Question Answering
— Unverified 0D3: Data Diversity Design for Systematic Generalization in Visual Question Answering Sep 15, 2023 Diversity Question Answering
Code Code Available 0