SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge May 23, 2024 Question Answering RAG
— Unverified 0Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering May 21, 2024 Diversity Information Retrieval
Code Code Available 0Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions May 18, 2024 Visual Question Answering (VQA)
— Unverified 0EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging May 18, 2024 Question Answering Visual Question Answering
— Unverified 0StackOverflowVQA: Stack Overflow Visual Question Answering Dataset May 17, 2024 Question Answering Sentence
— Unverified 0RMT-BVQA: Recurrent Memory Transformer-based Blind Video Quality Assessment for Enhanced Video Content May 14, 2024 Contrastive Learning Video Enhancement
— Unverified 0Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI May 12, 2024 Question Answering Visual Question Answering
— Unverified 0Federated Document Visual Question Answering: A Pilot Study May 10, 2024 Federated Learning Question Answering
Code Code Available 0Is the House Ready For Sleeptime? Generating and Evaluating Situational Queries for Embodied Question Answering May 8, 2024 2k Embodied Question Answering
— Unverified 0Advancing Multimodal Medical Capabilities of Gemini May 6, 2024 Computed Tomography (CT) image-classification
— Unverified 0VSA4VQA: Scaling a Vector Symbolic Architecture to Visual Question Answering on Natural Images May 6, 2024 Attribute Language Modeling
— Unverified 0Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis May 1, 2024 Image Captioning Question Answering
— Unverified 0Enhanced Textual Feature Extraction for Visual Question Answering: A Simple Convolutional Approach May 1, 2024 Computational Efficiency Question Answering
— Unverified 0Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Apr 30, 2024 Caption Generation Hallucination
— Unverified 0Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism Apr 29, 2024 document understanding GPU
Code Code Available 0NTIRE 2024 Quality Assessment of AI-Generated Content Challenge Apr 25, 2024 Image Quality Assessment Image Restoration
— Unverified 0RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis Apr 25, 2024 Segmentation Sentence
Code Code Available 0Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering Apr 24, 2024 Language Modeling Language Modelling
— Unverified 0AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results Apr 24, 2024 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0Self-Bootstrapped Visual-Language Model for Knowledge Selection and Question Answering Apr 22, 2024 Language Modeling Language Modelling
Code Code Available 0Exploring Diverse Methods in Visual Question Answering Apr 21, 2024 Question Answering Visual Question Answering
— Unverified 0Unified Scene Representation and Reconstruction for 3D Large Language Models Apr 19, 2024 3D Reconstruction Scene Understanding
— Unverified 0PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering Apr 19, 2024 Articles Information Retrieval
— Unverified 0TextSquare: Scaling up Text-Centric Visual Instruction Tuning Apr 19, 2024 Hallucination Hallucination Evaluation
— Unverified 0Look Before You Decide: Prompting Active Deduction of MLLMs for Assumptive Reasoning Apr 19, 2024 Benchmarking counterfactual
— Unverified 0MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale Apr 18, 2024 Decision Making Medical Visual Question Answering
— Unverified 0Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models Apr 18, 2024 GSM8K MMLU
— Unverified 0ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images Apr 16, 2024 Multimodal Deep Learning Optical Character Recognition (OCR)
Code Code Available 0Find The Gap: Knowledge Base Reasoning For Visual Question Answering Apr 16, 2024 Question Answering Retrieval
— Unverified 0Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs Apr 11, 2024 Descriptive Hallucination
Code Code Available 0BRAVE: Broadening the visual encoding of vision-language models Apr 10, 2024 Hallucination Language Modelling
— Unverified 0OmniFusion Technical Report Apr 9, 2024 MM-Vet TextVQA
Code Code Available 0HAMMR: HierArchical MultiModal React agents for generic VQA Apr 8, 2024 Optical Character Recognition (OCR) Question Answering
— Unverified 0Joint Visual and Text Prompting for Improved Object-Centric Perception with Multimodal Large Language Models Apr 6, 2024 MME Object
Code Code Available 0Study of the effect of Sharpness on Blind Video Quality Assessment Apr 6, 2024 SSIM Video Quality Assessment
— Unverified 0BuDDIE: A Business Document Dataset for Multi-task Information Extraction Apr 5, 2024 Document Classification document understanding
— Unverified 0TinyVQA: Compact Multimodal Deep Neural Network for Visual Question Answering on Resource-Constrained Devices Apr 4, 2024 Quantization Question Answering
— Unverified 0Detect2Interact: Localizing Object Key Field in Visual Question Answering (VQA) with LLMs Apr 1, 2024 Common Sense Reasoning Object
— Unverified 0Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training Mar 30, 2024 Contrastive Learning Question Answering
Code Code Available 0Visual Hallucination: Definition, Quantification, and Prescriptive Remediations Mar 26, 2024 Hallucination Image Captioning
— Unverified 0Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering Mar 26, 2024 Decision Making Explainable artificial intelligence
Code Code Available 0A Gaze-grounded Visual Question Answering Dataset for Clarifying Ambiguous Japanese Questions Mar 26, 2024 Gaze Target Estimation Question Answering
— Unverified 0Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA Mar 25, 2024 Chart Question Answering Data Augmentation
— Unverified 0Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery Mar 22, 2024 Language Modeling Language Modelling
— Unverified 0Multi-Modal Hallucination Control by Visual Information Grounding Mar 20, 2024 Hallucination Visual Question Answering (VQA)
— Unverified 0AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation Mar 20, 2024 Image Generation Text to Image Generation
— Unverified 0WoLF: Wide-scope Large Language Model Framework for CXR Understanding Mar 19, 2024 Anatomy Instruction Following
— Unverified 0SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors Mar 18, 2024 Hallucination Motion Planning
— Unverified 0FlexCap: Describe Anything in Images in Controllable Detail Mar 18, 2024 Attribute Dense Captioning
— Unverified 0Few-Shot VQA with Frozen LLMs: A Tale of Two Approaches Mar 17, 2024 Image Captioning Question Answering
— Unverified 0