Enhanced Textual Feature Extraction for Visual Question Answering: A Simple Convolutional Approach May 1, 2024 Computational Efficiency Question Answering
— Unverified 0Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Apr 30, 2024 Caption Generation Hallucination
— Unverified 0Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism Apr 29, 2024 document understanding GPU
Code Code Available 0ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images Apr 29, 2024 Optical Character Recognition Optical Character Recognition (OCR)
Code Code Available 1NTIRE 2024 Quality Assessment of AI-Generated Content Challenge Apr 25, 2024 Image Quality Assessment Image Restoration
— Unverified 0RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis Apr 25, 2024 Segmentation Sentence
Code Code Available 0AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results Apr 24, 2024 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering Apr 24, 2024 Language Modeling Language Modelling
— Unverified 0MDAgents: An Adaptive Collaboration of LLMs for Medical Decision-Making Apr 22, 2024 Decision Making Medical Diagnosis
Code Code Available 3Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering Apr 22, 2024 Language Modeling Language Modelling
Code Code Available 0Exploring Diverse Methods in Visual Question Answering Apr 21, 2024 Question Answering Visual Question Answering
— Unverified 0TextSquare: Scaling up Text-Centric Visual Instruction Tuning Apr 19, 2024 Hallucination Hallucination Evaluation
— Unverified 0PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering Apr 19, 2024 Articles Information Retrieval
— Unverified 0Unified Scene Representation and Reconstruction for 3D Large Language Models Apr 19, 2024 3D Reconstruction Scene Understanding
— Unverified 0Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning Apr 19, 2024 Benchmarking counterfactual
— Unverified 0LaPA: Latent Prompt Assist Model For Medical Visual Question Answering Apr 19, 2024 Medical Visual Question Answering Question Answering
Code Code Available 1MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale Apr 18, 2024 Decision Making Medical Visual Question Answering
— Unverified 0Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models Apr 18, 2024 GSM8K MMLU
— Unverified 0NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results Apr 17, 2024 Form valid
Code Code Available 2Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models Apr 16, 2024 image-classification Image Classification
Code Code Available 2ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images Apr 16, 2024 Multimodal Deep Learning Optical Character Recognition (OCR)
Code Code Available 0Find The Gap: Knowledge Base Reasoning For Visual Question Answering Apr 16, 2024 Question Answering Retrieval
— Unverified 0TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding Apr 15, 2024 Question Answering Visual Question Answering (VQA)
Code Code Available 1Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts Apr 12, 2024 Image Captioning Question Answering
Code Code Available 1Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs Apr 11, 2024 Descriptive Hallucination
Code Code Available 0BRAVE: Broadening the visual encoding of vision-language models Apr 10, 2024 Hallucination Language Modelling
— Unverified 0OmniFusion Technical Report Apr 9, 2024 MM-Vet TextVQA
Code Code Available 0MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding Apr 8, 2024 GPU Multiple-choice
Code Code Available 3HAMMR: HierArchical MultiModal React agents for generic VQA Apr 8, 2024 Optical Character Recognition (OCR) Question Answering
— Unverified 0Study of the effect of Sharpness on Blind Video Quality Assessment Apr 6, 2024 SSIM Video Quality Assessment
— Unverified 0Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models Apr 6, 2024 MME Object
Code Code Available 0BuDDIE: A Business Document Dataset for Multi-task Information Extraction Apr 5, 2024 Document Classification document understanding
— Unverified 0TinyVQA: Compact Multimodal Deep Neural Network for Visual Question Answering on Resource-Constrained Devices Apr 4, 2024 Quantization Question Answering
— Unverified 0Detect2Interact: Localizing Object Key Field in Visual Question Answering (VQA) with LLMs Apr 1, 2024 Common Sense Reasoning Object
— Unverified 0Evaluating Text-to-Visual Generation with Image-to-Text Generation Apr 1, 2024 Image to text Question Answering
Code Code Available 3Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training Mar 30, 2024 Contrastive Learning Question Answering
Code Code Available 0Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models Mar 29, 2024 Question Answering Visual Question Answering
Code Code Available 2JDocQA: Japanese Document Question Answering Dataset for Generative Language Models Mar 28, 2024 Hallucination Question Answering
Code Code Available 1Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective Mar 27, 2024 Question Answering Visual Question Answering
Code Code Available 1Visual Hallucination: Definition, Quantification, and Prescriptive Remediations Mar 26, 2024 Hallucination Image Captioning
— Unverified 0A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions Mar 26, 2024 Gaze Target Estimation Question Answering
— Unverified 0Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering Mar 26, 2024 Decision Making Explainable artificial intelligence
Code Code Available 0Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning Mar 25, 2024 Visual Question Answering (VQA)
Code Code Available 3Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA Mar 25, 2024 Chart Question Answering Data Augmentation
— Unverified 0IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models Mar 23, 2024 Common Sense Reasoning In-Context Learning
Code Code Available 1Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery Mar 22, 2024 Language Modeling Language Modelling
— Unverified 0MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis Mar 22, 2024 Medical Diagnosis Medical Visual Question Answering
Code Code Available 2Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering Mar 21, 2024 object-detection Object Detection
Code Code Available 1Multi-Modal Hallucination Control by Visual Information Grounding Mar 20, 2024 Hallucination Visual Question Answering (VQA)
— Unverified 0vid-TLDR: Training Free Token merging for Light-weight Video Transformer Mar 20, 2024 Action Recognition Computational Efficiency
Code Code Available 2