HAAR: Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles Dec 18, 2023 Question Answering Visual Question Answering
Code Code Available 1An Evaluation of GPT-4V and Gemini in Online VQA Dec 17, 2023 Question Answering Visual Question Answering
— Unverified 0M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge Base Dec 16, 2023 cross-modal alignment Knowledge Graphs
Code Code Available 0Advancing Surgical VQA with Scene Graph Knowledge Dec 15, 2023 Question Answering Visual Question Answering
— Unverified 0RankDVQA-mini: Knowledge Distillation-Driven Deep Video Quality Assessment Dec 14, 2023 Knowledge Distillation Model Compression
— Unverified 0CogAgent: A Visual Language Model for GUI Agents Dec 14, 2023 Language Modeling
Code Code Available 5BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering Dec 13, 2023 Medical Visual Question Answering Question Answering
— Unverified 0ViLA: Efficient Video-Language Alignment for Video Question Answering Dec 13, 2023 cross-modal alignment Language Modeling
Code Code Available 1Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator Dec 11, 2023 Image Captioning Question Answering
Code Code Available 1NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations Dec 11, 2023 Autonomous Driving Descriptive
Code Code Available 1Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models Dec 9, 2023 Question Answering Visual Question Answering
— Unverified 0Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects Dec 8, 2023 Image Captioning object-detection
— Unverified 0Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos Dec 7, 2023 Diagnostic Image Captioning
Code Code Available 1On the Robustness of Large Multimodal Models Against Image Adversarial Attacks Dec 6, 2023 Image Captioning image-classification
— Unverified 0Language-Informed Visual Concept Learning Dec 6, 2023 Disentanglement Novel Concepts
Code Code Available 1Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models Dec 5, 2023 Language Modeling Language Modelling
— Unverified 0Recursive Visual Programming Dec 4, 2023 Code Generation Question Answering
Code Code Available 1MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation Dec 4, 2023 Instruction Following Language Modeling
— Unverified 0Unleashing the Potential of Large Language Model: Zero-shot VQA for Flood Disaster Scenario Dec 4, 2023 Language Modeling Language Modelling
— Unverified 0How to Configure Good In-Context Sequence for Visual Question Answering Dec 4, 2023 In-Context Learning Question Answering
Code Code Available 1Zero-Shot Video Question Answering with Procedural Programs Dec 1, 2023 Code Generation Language Modeling
— Unverified 0Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts Dec 1, 2023 Chart Question Answering Document AI
— Unverified 0Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering Nov 29, 2023 Common Sense Reasoning Question Answering
— Unverified 0Debiasing Multimodal Models via Causal Information Minimization Nov 28, 2023 Visual Question Answering (VQA)
Code Code Available 1The curse of language biases in remote sensing VQA: the role of spatial attributes, language diversity, and the need for clear evaluation Nov 28, 2023 Diversity Question Answering
— Unverified 0Fully Authentic Visual Question Answering Dataset from Online Communities Nov 27, 2023 Question Answering Visual Question Answering
Code Code Available 0How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs Nov 27, 2023 Adversarial Robustness Visual Question Answering (VQA)
Code Code Available 1Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training Nov 23, 2023 Multimodal Reasoning Science Question Answering
Code Code Available 1From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation Nov 21, 2023 Explanation Generation Visual Question Answering (VQA)
— Unverified 0KNVQA: A Benchmark for evaluation knowledge-based VQA Nov 21, 2023 Hallucination Object Hallucination
— Unverified 0Filling the Image Information Gap for VQA: Prompting Large Language Models to Proactively Ask Questions Nov 20, 2023 Question Answering Visual Question Answering
Code Code Available 0Understanding and Mitigating Classification Errors Through Interpretable Token Patterns Nov 18, 2023 Classification NER
— Unverified 0HIDRO-VQA: High Dynamic Range Oracle for Video Quality Assessment Nov 18, 2023 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 1Video-LLaVA: Learning United Visual Representation by Alignment Before Projection Nov 16, 2023 Language Modeling Language Modelling
Code Code Available 4Multiple-Question Multiple-Answer Text-VQA Nov 15, 2023 Decoder Denoising
— Unverified 0Improving Zero-shot Visual Question Answering via Large Language Models with Reasoning Question Prompts Nov 15, 2023 Question Answering Sentence
Code Code Available 0Attribute Diversity Determines the Systematicity Gap in VQA Nov 15, 2023 Attribute Diagnostic
Code Code Available 0Asking More Informative Questions for Grounded Retrieval Nov 14, 2023 Question Answering Question Selection
— Unverified 0A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering Nov 13, 2023 Decision Making Explanation Generation
Code Code Available 1CLiF-VQA: Enhancing Video Quality Assessment by Incorporating High-Level Semantic Information related to Human Feelings Nov 13, 2023 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0What Large Language Models Bring to Text-rich VQA? Nov 13, 2023 Image Comprehension Optical Character Recognition (OCR)
— Unverified 0SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models Nov 13, 2023 Described Object Detection Language Modeling
Code Code Available 4InfMLLM: A Unified Framework for Visual-Language Tasks Nov 12, 2023 GPU Image Captioning
Code Code Available 1Visual Commonsense based Heterogeneous Graph Contrastive Learning Nov 11, 2023 Contrastive Learning Question Answering
— Unverified 0Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models Nov 11, 2023 Image Captioning MMR total
Code Code Available 3Analyzing Modular Approaches for Visual Question Decomposition Nov 10, 2023 Code Generation Visual Question Answering (VQA)
Code Code Available 0Improving Vision-and-Language Reasoning via Spatial Relations Modeling Nov 9, 2023 Position regression Relation
— Unverified 0Zero-shot Translation of Attention Patterns in VQA Models to Natural Language Nov 8, 2023 Image Captioning Language Modeling
Code Code Available 0mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration Nov 7, 2023 1 Image, 2*2 Stitching Decoder
Code Code Available 4CogVLM: Visual Expert for Pretrained Language Models Nov 6, 2023 1 Image, 2*2 Stitching FS-MEVQA
Code Code Available 5