Towards a Unified Multimodal Reasoning Framework Dec 22, 2023 Multimodal Reasoning Multiple-choice
Code Code Available 0LLM4VG: Large Language Models Evaluation for Video Grounding Dec 21, 2023 Image Captioning Video Grounding
— Unverified 0Reducing Hallucinations: Enhancing VQA for Flood Disaster Damage Assessment with Visual Contexts Dec 21, 2023 Hallucination Question Answering
— Unverified 0Object Attribute Matters in Visual Question Answering Dec 20, 2023 Attribute Graph Neural Network
Code Code Available 0BloomVQA: Assessing Hierarchical Multi-modal Comprehension Dec 20, 2023 Data Augmentation Memorization
— Unverified 0Multi-Clue Reasoning with Memory Augmentation for Knowledge-based Visual Question Answering Dec 20, 2023 Question Answering Visual Question Answering
— Unverified 0Interactive Visual Task Learning for Robots Dec 20, 2023 Continual Learning Novel Concepts
— Unverified 0VQA4CIR: Boosting Composed Image Retrieval with Visual Question Answering Dec 19, 2023 Image Retrieval Question Answering
Code Code Available 0Full-reference Video Quality Assessment for User Generated Content Transcoding Dec 19, 2023 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0An Evaluation of GPT-4V and Gemini in Online VQA Dec 17, 2023 Question Answering Visual Question Answering
— Unverified 0M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge Base Dec 16, 2023 cross-modal alignment Knowledge Graphs
Code Code Available 0Advancing Surgical VQA with Scene Graph Knowledge Dec 15, 2023 Question Answering Visual Question Answering
— Unverified 0RankDVQA-mini: Knowledge Distillation-Driven Deep Video Quality Assessment Dec 14, 2023 Knowledge Distillation Model Compression
— Unverified 0BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering Dec 13, 2023 Medical Visual Question Answering Question Answering
— Unverified 0Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models Dec 9, 2023 Question Answering Visual Question Answering
— Unverified 0Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects Dec 8, 2023 Image Captioning object-detection
— Unverified 0On the Robustness of Large Multimodal Models Against Image Adversarial Attacks Dec 6, 2023 Image Captioning image-classification
— Unverified 0Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models Dec 5, 2023 Language Modeling Language Modelling
— Unverified 0MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation Dec 4, 2023 Instruction Following Language Modeling
— Unverified 0Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario Dec 4, 2023 Language Modeling Language Modelling
— Unverified 0Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts Dec 1, 2023 Chart Question Answering Document AI
— Unverified 0Zero-Shot Video Question Answering with Procedural Programs Dec 1, 2023 Code Generation Language Modeling
— Unverified 0Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering Nov 29, 2023 Common Sense Reasoning Question Answering
— Unverified 0The curse of language biases in remote sensing VQA: the role of spatial attributes, language diversity, and the need for clear evaluation Nov 28, 2023 Diversity Question Answering
— Unverified 0Fully Authentic Visual Question Answering Dataset from Online Communities Nov 27, 2023 Question Answering Visual Question Answering
Code Code Available 0From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation Nov 21, 2023 Explanation Generation Visual Question Answering (VQA)
— Unverified 0KNVQA: A Benchmark for evaluation knowledge-based VQA Nov 21, 2023 Hallucination Object Hallucination
— Unverified 0Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions Nov 20, 2023 Question Answering Visual Question Answering
Code Code Available 0Understanding and Mitigating Classification Errors Through Interpretable Token Patterns Nov 18, 2023 Classification NER
— Unverified 0Attribute Diversity Determines the Systematicity Gap in VQA Nov 15, 2023 Attribute Diagnostic
Code Code Available 0Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts Nov 15, 2023 Question Answering Sentence
Code Code Available 0Multiple-Question Multiple-Answer Text-VQA Nov 15, 2023 Decoder Denoising
— Unverified 0Asking More Informative Questions for Grounded Retrieval Nov 14, 2023 Question Answering Question Selection
— Unverified 0CLiF-VQA: Enhancing Video Quality Assessment by Incorporating High-Level Semantic Information related to Human Feelings Nov 13, 2023 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0What Large Language Models Bring to Text-rich VQA? Nov 13, 2023 Image Comprehension Optical Character Recognition (OCR)
— Unverified 0Visual Commonsense based Heterogeneous Graph Contrastive Learning Nov 11, 2023 Contrastive Learning Question Answering
— Unverified 0Analyzing Modular Approaches for Visual Question Decomposition Nov 10, 2023 Code Generation Visual Question Answering (VQA)
Code Code Available 0Improving Vision-and-Language Reasoning via Spatial Relations Modeling Nov 9, 2023 Position regression Relation
— Unverified 0Zero-shot Translation of Attention Patterns in VQA Models to Natural Language Nov 8, 2023 Image Captioning Language Modeling
Code Code Available 0From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities Nov 1, 2023 Navigate Question Answering
— Unverified 0VQA-GEN: A Visual Question Answering Benchmark for Domain Generalization Nov 1, 2023 Domain Generalization Question Answering
— Unverified 0A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis Oct 31, 2023 Descriptive Medical Image Analysis
— Unverified 0Dynamic Task and Weight Prioritization Curriculum Learning for Multimodal Imagery Oct 29, 2023 Deep Learning Multimodal Deep Learning
Code Code Available 0ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese Oct 27, 2023 Information Retrieval Natural Language Queries
Code Code Available 0Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation Oct 27, 2023 Image Generation Question Answering
— Unverified 0Incorporating Probing Signals into Multimodal Machine Translation via Visual Question-Answering Pairs Oct 26, 2023 Attribute Machine Translation
Code Code Available 0Exploring Question Decomposition for Zero-Shot VQA Oct 25, 2023 Question Answering Visual Question Answering
— Unverified 0Geometry-Aware Video Quality Assessment for Dynamic Digital Human Oct 24, 2023 Attribute Video Quality Assessment
— Unverified 0LXMERT Model Compression for Visual Question Answering Oct 23, 2023 model Model Compression
Code Code Available 0A Simple Baseline for Knowledge-Based Visual Question Answering Oct 20, 2023 In-Context Learning Question Answering
Code Code Available 0