NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations Dec 11, 2023 Autonomous Driving Descriptive
Code Code Available 1Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos Dec 7, 2023 Diagnostic Image Captioning
Code Code Available 1Language-Informed Visual Concept Learning Dec 6, 2023 Disentanglement Novel Concepts
Code Code Available 1Recursive Visual Programming Dec 4, 2023 Code Generation Question Answering
Code Code Available 1How to Configure Good In-Context Sequence for Visual Question Answering Dec 4, 2023 In-Context Learning Question Answering
Code Code Available 1Debiasing Multimodal Models via Causal Information Minimization Nov 28, 2023 Visual Question Answering (VQA)
Code Code Available 1How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs Nov 27, 2023 Adversarial Robustness Visual Question Answering (VQA)
Code Code Available 1Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training Nov 23, 2023 Multimodal Reasoning Science Question Answering
Code Code Available 1HIDRO-VQA: High Dynamic Range Oracle for Video Quality Assessment Nov 18, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering Nov 13, 2023 Decision Making Explanation Generation
Code Code Available 1InfMLLM: A Unified Framework for Visual-Language Tasks Nov 12, 2023 GPU Image Captioning
Code Code Available 1GPT-4V-AD: Exploring Grounding Potential of VQA-oriented GPT-4V for Zero-shot Anomaly Detection Nov 5, 2023 Anomaly Detection Question Answering
Code Code Available 1Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts Oct 31, 2023 Image Captioning Language Modeling
Code Code Available 1Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V Oct 29, 2023 Diagnostic Language Modeling
Code Code Available 1EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images Oct 28, 2023 Decision Making Medical Visual Question Answering
Code Code Available 13D-Aware Visual Question Answering about Parts, Poses and Occlusions Oct 27, 2023 Question Answering Visual Question Answering
Code Code Available 1Large Language Models are Temporal and Causal Reasoners for Video Question Answering Oct 24, 2023 Natural Language Understanding Question Answering
Code Code Available 1Towards Perceiving Small Visual Details in Zero-shot Visual Question Answering with Multimodal LLMs Oct 24, 2023 Question Answering Visual Question Answering
Code Code Available 1VLIS: Unimodal Language Models Guide Multimodal Language Generation Oct 15, 2023 Caption Generation Explanation Generation
Code Code Available 1PaLI-3 Vision Language Models: Smaller, Faster, Stronger Oct 13, 2023 Chart Question Answering Cross-Modal Retrieval
Code Code Available 1What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models Oct 10, 2023 Benchmarking Code Generation
Code Code Available 1Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models Oct 9, 2023 Language Modelling Question Answering
Code Code Available 1HallE-Control: Controlling Object Hallucination in Large Multimodal Models Oct 3, 2023 Attribute Decoder
Code Code Available 1Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning Oct 1, 2023 In-Context Learning Instruction Following
Code Code Available 1Vulnerabilities in Video Quality Assessment Models: The Challenge of Adversarial Attacks Sep 24, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1Can I Trust Your Answer? Visually Grounded Video Question Answering Sep 4, 2023 Grounded Video Question Answering Question Answering
Code Code Available 1VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control Aug 18, 2023 Image Captioning Text Generation
Code Code Available 1Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models Aug 18, 2023 Multiple-choice Question Answering
Code Code Available 1Uni-NLX: Unifying Textual Explanations for Vision and Vision-Language Tasks Aug 17, 2023 Question Answering Text Generation
Code Code Available 1Pro-Cap: Leveraging a Frozen Vision-Language Model for Hateful Meme Detection Aug 16, 2023 Image Captioning Language Modeling
Code Code Available 1Detecting and Preventing Hallucinations in Large Vision Language Models Aug 11, 2023 16k Hallucination
Code Code Available 1StableVQA: A Deep No-Reference Quality Assessment Model for Video Stability Aug 9, 2023 Optical Flow Estimation Video Quality Assessment
Code Code Available 1SciGraphQA: A Large-Scale Synthetic Multi-Turn Question-Answering Dataset for Scientific Graphs Aug 7, 2023 Question Answering Visual Question Answering
Code Code Available 1Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models Jul 26, 2023 Image Quality Assessment No-Reference Image Quality Assessment
Code Code Available 1Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering Jul 22, 2023 Graph Representation Learning Language Modeling
Code Code Available 1Explaining Autonomous Driving Actions with Visual Question Answering Jul 19, 2023 Autonomous Driving Autonomous Vehicles
Code Code Available 1CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery Jul 11, 2023 Question Answering Scene Understanding
Code Code Available 1Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting Jul 11, 2023 Medical Visual Question Answering Question Answering
Code Code Available 1Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models Jul 9, 2023 Question Answering TGIF-Frame
Code Code Available 1Localized Questions in Medical Visual Question Answering Jul 3, 2023 Medical Visual Question Answering Question Answering
Code Code Available 1UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding Jul 3, 2023 Image-text matching Sentence
Code Code Available 1Multimodal Prompt Retrieval for Generative Visual Question Answering Jun 30, 2023 Domain Adaptation Generative Visual Question Answering
Code Code Available 1Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering Jun 29, 2023 Answer Generation Question Answering
Code Code Available 1Kosmos-2: Grounding Multimodal Large Language Models to the World Jun 26, 2023 Image Captioning In-Context Learning
Code Code Available 1FunQA: Towards Surprising Video Comprehension Jun 26, 2023 Question Answering Text Generation
Code Code Available 1Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering Jun 16, 2023 Image Captioning Question Answering
Code Code Available 1COSA: Concatenated Sample Pretrained Vision-Language Foundation Model Jun 15, 2023 Form model
Code Code Available 1Improving Selective Visual Question Answering by Learning from Your Peers Jun 14, 2023 Question Answering Visual Question Answering
Code Code Available 1Scalable Neural-Probabilistic Answer Set Programming Jun 14, 2023 Probabilistic Programming Question Answering
Code Code Available 1Modular Visual Question Answering via Code Generation Jun 8, 2023 Code Generation In-Context Learning
Code Code Available 1