Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering Jul 30, 2024 Code Generation Question Answering
— Unverified 0Take A Step Back: Rethinking the Two Stages in Visual Reasoning Jul 29, 2024 Logical Reasoning Question Answering
— Unverified 0Improved Few-Shot Image Classification Through Multiple-Choice Questions Jul 23, 2024 Articles Few-Shot Image Classification
— Unverified 0Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models Jul 22, 2024 Question Answering Representation Learning
— Unverified 0QuIIL at T3 challenge: Towards Automation in Life-Saving Intervention Procedures from First-Person View Jul 18, 2024 Action Anticipation Action Recognition
Code Code Available 0ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data Jul 17, 2024 Question Answering Visual Question Answering
— Unverified 0Multimodal Reranking for Knowledge-Intensive Visual Question Answering Jul 17, 2024 Answer Generation Question Answering
— Unverified 0EchoSight: Advancing Visual-Language Models with Wiki Knowledge Jul 17, 2024 Articles Question Answering
— Unverified 0TM-PATHVQA:90000+ Textless Multilingual Questions for Medical Visual Question Answering Jul 16, 2024 Medical Visual Question Answering Question Answering
— Unverified 0Extracting Training Data from Document-Based VQA Models Jul 11, 2024 Memorization Question Answering
— Unverified 0Segmentation-guided Attention for Visual Question Answering from Remote Sensing Images Jul 11, 2024 Question Answering Segmentation
— Unverified 0VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving Jul 9, 2024 Autonomous Driving Image to 3D
— Unverified 0Large Language Models Understand Layout Jul 8, 2024 Question Answering Visual Question Answering
Code Code Available 0CLIPVQA:Video Quality Assessment via CLIP Jul 6, 2024 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Black-box Model Ensembling for Textual and Visual Question Answering via Information Fusion Jul 4, 2024 Question Answering Visual Question Answering
Code Code Available 0Visual Robustness Benchmark for Visual Question Answering (VQA) Jul 3, 2024 Visual Question Answering Visual Question Answering (VQA)
Code Code Available 0BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs Jul 3, 2024 Image Captioning Image Generation
— Unverified 0MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis Jul 3, 2024 Position Question Answering
— Unverified 0D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions Jul 2, 2024 Diagnostic Instruction Following
— Unverified 0https://arxiv.org/abs/2407.00634 Jul 2, 2024 Video Captioning Video Description
Code Code Available 0Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness Jul 2, 2024 Image Captioning Question Answering
— Unverified 0μ-Bench: A Vision-Language Benchmark for Microscopy Understanding Jul 1, 2024 Cell Detection Classification
Code Code Available 0Hierarchical Memory for Long Video QA Jun 30, 2024 GPU Question Answering
— Unverified 0SK-VQA: Synthetic Knowledge Generation at Scale for Training Context-Augmented Multimodal LLMs Jun 28, 2024 RAG Retrieval-augmented Generation
— Unverified 0Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA Jun 27, 2024 General Knowledge Question Answering
— Unverified 0RAVEN: Multitask Retrieval Augmented Vision-Language Learning Jun 27, 2024 Image Captioning RAG
— Unverified 0Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation Jun 27, 2024 Continual Learning Question Answering
Code Code Available 0On the Role of Visual Grounding in VQA Jun 26, 2024 Visual Grounding Visual Question Answering (VQA)
— Unverified 0MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs Jun 24, 2024 Question Answering Visual Question Answering
— Unverified 0Priorformer: A UGC-VQA Method with content and distortion priors Jun 24, 2024 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts Jun 24, 2024 Mathematical Reasoning Visual Question Answering (VQA)
— Unverified 0Tri-VQA: Triangular Reasoning Medical Visual Question Answering for Multi-Attribute Analysis Jun 21, 2024 Attribute Medical Visual Question Answering
— Unverified 0VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning Jun 20, 2024 Image Comprehension Question Answering
Code Code Available 0Biomedical Visual Instruction Tuning with Clinician Preference Alignment Jun 19, 2024 Instruction Following Visual Question Answering (VQA)
Code Code Available 0Diversify, Rationalize, and Combine: Ensembling Multiple QA Strategies for Zero-shot Knowledge-based VQA Jun 18, 2024 Question Answering Visual Question Answering
Code Code Available 0Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model Jun 15, 2024 Question Answering Video Understanding
Code Code Available 0What is the Visual Cognition Gap between Humans and Multimodal LLMs? Jun 14, 2024 object-detection Object Detection
Code Code Available 0Precision Empowers, Excess Distracts: Visual Question Answering With Dynamically Infused Knowledge In Language Models Jun 14, 2024 Decoder Knowledge Graphs
— Unverified 0Optimizing Visual Question Answering Models for Driving: Bridging the Gap Between Human and Machine Attention Patterns Jun 13, 2024 Autonomous Driving Question Answering
— Unverified 0CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark Jun 10, 2024 Diversity Question Answering
— Unverified 0Composition Vision-Language Understanding via Segment and Depth Anything Model Jun 7, 2024 Question Answering Visual Question Answering (VQA)
Code Code Available 0Understanding Information Storage and Transfer in Multi-modal Large Language Models Jun 6, 2024 Factual Visual Question Answering Model Editing
— Unverified 0Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following Jun 4, 2024 Question Answering Visual Question Answering
Code Code Available 0Translation Deserves Better: Analyzing Translation Artifacts in Cross-lingual Visual Question Answering Jun 4, 2024 Data Augmentation Machine Translation
— Unverified 0Mixture of Rationale: Multi-Modal Reasoning Mixture for Visual Question Answering Jun 3, 2024 Diversity Question Answering
— Unverified 0Selectively Answering Visual Questions Jun 3, 2024 Avg In-Context Learning
— Unverified 0VQA Training Sets are Self-play Environments for Generating Few-shot Pools May 30, 2024 Question Answering Visual Question Answering
— Unverified 0Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks May 29, 2024 Question Answering Visual Question Answering
— Unverified 0PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild May 28, 2024 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Privacy-Aware Visual Language Models May 27, 2024 Visual Question Answering (VQA)
— Unverified 0