SemiHVision: Enhancing Medical Multimodal Models with a Semi-Human Annotated Dataset and Fine-Tuned Instruction Generation Oct 19, 2024 Diagnostic GPU
Code Code Available 0ChitroJera: A Regionally Relevant Visual Question Answering Dataset for Bangla Oct 19, 2024 Question Answering Visual Question Answering
— Unverified 0LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound Oct 19, 2024 Instruction Following Knowledge Distillation
— Unverified 0ViConsFormer: Constituting Meaningful Phrases of Scene Texts using Transformer-based Method in Vietnamese Text-based Visual Question Answering Oct 18, 2024 Question Answering Visual Question Answering
Code Code Available 0NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Oct 18, 2024 Attribute Question Answering
— Unverified 0Latent Image and Video Resolution Prediction using Convolutional Neural Networks Oct 17, 2024 Video Quality Assessment Visual Question Answering (VQA)
— Unverified 0RescueADI: Adaptive Disaster Interpretation in Remote Sensing Images with Autonomous Agents Oct 17, 2024 Question Answering Task Planning
— Unverified 0ActionCOMET: A Zero-shot Approach to Learn Image-specific Commonsense Concepts about Actions Oct 17, 2024 Visual Question Answering (VQA)
Code Code Available 0Help Me Identify: Is an LLM+VQA System All We Need to Identify Visual Concepts? Oct 17, 2024 All Language Modeling
Code Code Available 0MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models Oct 16, 2024 Diagnostic Hallucination
Code Code Available 3WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Oct 16, 2024 Question Answering Visual Question Answering
Code Code Available 1VividMed: Vision Language Model with Versatile Visual Grounding for Medicine Oct 16, 2024 Language Modeling Language Modelling
Code Code Available 1SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding Oct 15, 2024 Instruction Following Visual Question Answering (VQA)
— Unverified 0Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMs Oct 15, 2024 Image Description Multiple-choice
Code Code Available 0Towards Foundation Models for 3D Vision: How Close Are We? Oct 14, 2024 Question Answering Visual Question Answering
Code Code Available 1Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention Oct 14, 2024 Contrastive Learning counterfactual
— Unverified 0LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content Oct 14, 2024 Visual Question Answering (VQA) World Knowledge
Code Code Available 1Skipping Computations in Multimodal LLMs Oct 12, 2024 Question Answering Visual Question Answering
Code Code Available 1Declarative Knowledge Distillation from Large Language Models for Visual Question Answering Datasets Oct 12, 2024 Knowledge Distillation Question Answering
Code Code Available 0ViT3D Alignment of LLaMA3: 3D Medical Image Report Generation Oct 11, 2024 Diagnostic Language Modeling
— Unverified 0Quality Prediction of AI Generated Images and Videos: Emerging Trends and Opportunities Oct 11, 2024 Denoising Image Quality Assessment
— Unverified 0Secure Video Quality Assessment Resisting Adversarial Attacks Oct 9, 2024 Adversarial Defense Video Quality Assessment
— Unverified 0Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning Oct 8, 2024 Image Retrieval Math
— Unverified 0ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments Oct 8, 2024 Decoder Question Answering
Code Code Available 0DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback Oct 8, 2024 Math Sequential Decision Making
Code Code Available 1ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models Oct 7, 2024 Question Answering Visual Question Answering
Code Code Available 1MC-CoT: A Modular Collaborative CoT Framework for Zero-shot Medical-VQA with LLM and MLLM Integration Oct 6, 2024 Medical Visual Question Answering Question Answering
Code Code Available 1TUBench: Benchmarking Large Vision-Language Models on Trustworthiness with Unanswerable Questions Oct 5, 2024 Benchmarking Hallucination
Code Code Available 0BadCM: Invisible Backdoor Attack Against Cross-Modal Learning Oct 3, 2024 Backdoor Attack Cross-Modal Retrieval
Code Code Available 1LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model Oct 3, 2024 image-classification Image Classification
— Unverified 0Video Instruction Tuning With Synthetic Data Oct 3, 2024 3D Question Answering (3D-QA)
— Unverified 0Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities Oct 2, 2024 Question Answering Visual Question Answering
— Unverified 0Backdooring Vision-Language Models with Out-Of-Distribution Data Oct 2, 2024 Image Captioning Image to text
— Unverified 0Unleashing the Potentials of Likelihood Composition for Multi-modal Language Models Oct 1, 2024 Question Answering Visual Question Answering
Code Code Available 0BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data Oct 1, 2024 Code Generation Logical Reasoning
Code Code Available 0FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks Oct 1, 2024 Benchmarking Fairness
— Unverified 0A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning Oct 1, 2024 Common Sense Reasoning DeepFake Detection
Code Code Available 1T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition Sep 29, 2024 In-Context Learning Question Answering
Code Code Available 1Visual Question Decomposition on Multimodal Large Language Models Sep 28, 2024 Visual Question Answering (VQA)
— Unverified 0TrojVLM: Backdoor Attack Against Vision Language Models Sep 28, 2024 Backdoor Attack Image Captioning
— Unverified 03D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models Sep 28, 2024 Diagnostic Language Modeling
— Unverified 0Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations Sep 27, 2024 Chart Question Answering Question Answering
— Unverified 0DARE: Diverse Visual Question Answering with Robustness Evaluation Sep 26, 2024 image-classification Image Classification
— Unverified 0ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context Information in Multi-Turn Multimodal Medical Dialogue Sep 26, 2024 Medical Visual Question Answering Question Answering
— Unverified 0A Unified Hallucination Mitigation Framework for Large Vision-Language Models Sep 24, 2024 Hallucination Question Answering
Code Code Available 0MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models Sep 23, 2024 Medical Visual Question Answering Question Answering
Code Code Available 1Advancing Video Quality Assessment for AIGC Sep 23, 2024 Image Generation Text Generation
— Unverified 0Revisiting Video Quality Assessment from the Perspective of Generalization Sep 23, 2024 Image Quality Assessment Video Quality Assessment
Code Code Available 0Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation Sep 23, 2024 Multiple-choice Question Answering
— Unverified 0Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models Sep 23, 2024 Decision Making Question Answering
Code Code Available 0