Visual question answering based evaluation metrics for text-to-image generation Nov 15, 2024 Image Generation Image Manipulation
— Unverified 0Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding Nov 12, 2024 document understanding Optical Character Recognition (OCR)
— Unverified 0SparrowVQE: Visual Question Explanation for Course Content Understanding Nov 12, 2024 Question Answering Visual Question Answering
Code Code Available 0Aligned Vector Quantization for Edge-Cloud Collabrative Vision-Language Models Nov 8, 2024 Quantization Question Answering
— Unverified 0Select2Plan: Training-Free ICL-Based Planning through VQA and Memory Retrieval Nov 6, 2024 Autonomous Navigation In-Context Learning
— Unverified 0NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA Nov 6, 2024 Federated Learning Language Modelling
— Unverified 0Multimodal Commonsense Knowledge Distillation for Visual Question Answering Nov 5, 2024 Knowledge Distillation Question Answering
— Unverified 0MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning Nov 5, 2024 MME Question Answering
— Unverified 0One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering Nov 4, 2024 Continual Learning Question Answering
— Unverified 0A Visual Question Answering Method for SAR Ship: Breaking the Requirement for Multimodal Dataset Construction and Model Fine-Tuning Nov 3, 2024 object-detection Object Detection
— Unverified 0Goal-Oriented Semantic Communication for Wireless Visual Question Answering Nov 3, 2024 Edge-computing Question Answering
— Unverified 0Right this way: Can VLMs Guide Us to See More to Answer Questions? Nov 1, 2024 Question Answering Visual Question Answering
Code Code Available 0Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP Oct 31, 2024 Image Captioning Prompt Learning
— Unverified 0SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset Oct 30, 2024 Question Answering Visual Question Answering
— Unverified 0Are VLMs Really Blind Oct 29, 2024 Language Modeling Language Modelling
Code Code Available 0Improving Generalization in Visual Reasoning via Self-Ensemble Oct 28, 2024 Visual Question Answering (VQA) Visual Reasoning
— Unverified 0AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? Oct 28, 2024 Benchmarking Question Answering
Code Code Available 0Attention Overlap Is Responsible for The Entity Missing Problem in Text-to-image Diffusion Models! Oct 28, 2024 Denoising Question Answering
— Unverified 0Few-Shot Multimodal Explanation for Visual Question Answering Oct 28, 2024 Explainable artificial intelligence Explainable Artificial Intelligence (XAI)
Code Code Available 0Efficient Bilinear Attention-based Fusion for Medical Visual Question Answering Oct 28, 2024 Computational Efficiency Decision Making
— Unverified 0R-LLaVA: Improving Med-VQA Understanding through Visual Region of Interest Oct 27, 2024 Medical Visual Question Answering Multiple-choice
— Unverified 0GPT-4o System Card Oct 25, 2024 Multiple-choice Spatial Reasoning
— Unverified 0Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering Oct 23, 2024 Federated Learning Medical Visual Question Answering
— Unverified 0Visual Question Answering in Ophthalmology: A Progressive and Practical Perspective Oct 22, 2024 Question Answering Visual Question Answering
— Unverified 0Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models Oct 21, 2024 Instruction Following object-detection
— Unverified 0LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound Oct 19, 2024 Instruction Following Knowledge Distillation
— Unverified 0ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla Oct 19, 2024 Question Answering Visual Question Answering
— Unverified 0SemiHVision: Enhancing Medical Multimodal Models with a Semi-Human Annotated Dataset and Fine-Tuned Instruction Generation Oct 19, 2024 Diagnostic GPU
Code Code Available 0NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Oct 18, 2024 Attribute Question Answering
— Unverified 0ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering Oct 18, 2024 Question Answering Visual Question Answering
Code Code Available 0Latent Image and Video Resolution Prediction using Convolutional Neural Networks Oct 17, 2024 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts? Oct 17, 2024 All Language Modeling
Code Code Available 0RescueADI: Adaptive Disaster Interpretation in Remote Sensing Images with Autonomous Agents Oct 17, 2024 Question Answering Task Planning
— Unverified 0ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions Oct 17, 2024 Visual Question Answering (VQA)
Code Code Available 0Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs Oct 15, 2024 Image Description Multiple-choice
Code Code Available 0SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding Oct 15, 2024 Instruction Following Visual Question Answering (VQA)
— Unverified 0Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention Oct 14, 2024 Contrastive Learning counterfactual
— Unverified 0Declarative Knowledge Distillation from Large Language Models for Visual Question Answering Datasets Oct 12, 2024 Knowledge Distillation Question Answering
Code Code Available 0Quality Prediction of AI Generated Images and Videos: Emerging Trends and Opportunities Oct 11, 2024 Denoising Image Quality Assessment
— Unverified 0ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation Oct 11, 2024 Diagnostic Language Modeling
— Unverified 0Secure Video Quality Assessment Resisting Adversarial Attacks Oct 9, 2024 Adversarial Defense Video Quality Assessment
— Unverified 0Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning Oct 8, 2024 Image Retrieval Math
— Unverified 0ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments Oct 8, 2024 Decoder Question Answering
Code Code Available 0TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions Oct 5, 2024 Benchmarking Hallucination
Code Code Available 0Video Instruction Tuning With Synthetic Data Oct 3, 2024 3D Question Answering (3D-QA)
— Unverified 0LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model Oct 3, 2024 image-classification Image Classification
— Unverified 0Backdooring Vision-Language Models with Out-Of-Distribution Data Oct 2, 2024 Image Captioning Image to text
— Unverified 0Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities Oct 2, 2024 Question Answering Visual Question Answering
— Unverified 0BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data Oct 1, 2024 Code Generation Logical Reasoning
Code Code Available 0FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks Oct 1, 2024 Benchmarking Fairness
— Unverified 0