FMBench: Benchmarking Fairness in Multimodal Large Language Models on Medical Tasks Oct 1, 2024 Benchmarking Fairness
— Unverified 0TrojVLM: Backdoor Attack Against Vision Language Models Sep 28, 2024 Backdoor Attack Image Captioning
— Unverified 03D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models Sep 28, 2024 Diagnostic Language Modeling
— Unverified 0Visual Question Decomposition on Multimodal Large Language Models Sep 28, 2024 Visual Question Answering (VQA)
— Unverified 0Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations Sep 27, 2024 Chart Question Answering Question Answering
— Unverified 0ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context Information in Multi-Turn Multimodal Medical Dialogue Sep 26, 2024 Medical Visual Question Answering Question Answering
— Unverified 0DARE: Diverse Visual Question Answering with Robustness Evaluation Sep 26, 2024 image-classification Image Classification
— Unverified 0A Unified Hallucination Mitigation Framework for Large Vision-Language Models Sep 24, 2024 Hallucination Question Answering
Code Code Available 0Advancing Video Quality Assessment for AIGC Sep 23, 2024 Image Generation Text Generation
— Unverified 0Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation Sep 23, 2024 Multiple-choice Question Answering
— Unverified 0Revisiting Video Quality Assessment from the Perspective of Generalization Sep 23, 2024 Image Quality Assessment Video Quality Assessment
Code Code Available 0Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models Sep 23, 2024 Decision Making Question Answering
Code Code Available 0@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology Sep 21, 2024 Benchmarking Depth Estimation
— Unverified 0JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images Sep 19, 2024 Hallucination Image Captioning
Code Code Available 0Sparks of Artificial General Intelligence(AGI) in Semiconductor Material Science: Early Explorations into the Next Frontier of Generative AI-Assisted Electron Micrograph Analysis Sep 17, 2024 In-Context Learning Question Answering
— Unverified 0CAST: Cross-modal Alignment Similarity Test for Vision Language Models Sep 17, 2024 cross-modal alignment Question Answering
Code Code Available 0QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems Sep 14, 2024 Question Answering Video Question Answering
— Unverified 0Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types Sep 14, 2024 Language Modeling Language Modelling
Code Code Available 0Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering Sep 11, 2024 Question Answering Visual Question Answering
— Unverified 0COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes Sep 6, 2024 Multiple-choice Question Answering
Code Code Available 0How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model? Sep 3, 2024 In-Context Learning Language Modeling
Code Code Available 0Kvasir-VQA: A Text-Image Pair GI Tract Dataset Sep 2, 2024 Image Captioning Image Generation
Code Code Available 0Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering Aug 30, 2024 Decoder Language Modeling
— Unverified 0Look, Learn and Leverage (L^3): Mitigating Visual-Domain Shift and Discovering Intrinsic Relations via Symbolic Alignment Aug 30, 2024 Question Answering Representation Learning
— Unverified 0GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models Aug 29, 2024 Bias Detection Fairness
Code Code Available 0M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation Aug 29, 2024 Instruction Following Medical Report Generation
— Unverified 0Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail Aug 28, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0Can SAR improve RSVQA performance? Aug 28, 2024 Question Answering Visual Question Answering
— Unverified 0Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis Aug 27, 2024 Benchmarking Large Language Model
— Unverified 0Multi-Modal Instruction-Tuning Small-Scale Language-and-Vision Assistant for Semiconductor Electron Micrograph Analysis Aug 27, 2024 Instruction Following Question Answering
— Unverified 0LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models Aug 26, 2024 Large Language Model Video Quality Assessment
Code Code Available 0Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering Aug 24, 2024 knowledge editing Open-Domain Question Answering
— Unverified 0Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption Aug 23, 2024 Instruction Following Knowledge Distillation
— Unverified 0Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese Aug 22, 2024 Language Modeling Language Modelling
— Unverified 0Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework Aug 21, 2024 geo-localization Language Modeling
— Unverified 0Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm Aug 16, 2024 Decision Making Medical Visual Question Answering
Code Code Available 0Beyond the Hype: A dispassionate look at vision-language models in medical scenario Aug 16, 2024 Question Answering Spatial Reasoning
— Unverified 0Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion Aug 14, 2024 Question Answering Visual Question Answering
— Unverified 0Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual Reality Aug 13, 2024 Video Compression Video Quality Assessment
— Unverified 0Long-Form Answers to Visual Questions from Blind and Low Vision People Aug 12, 2024 Form Visual Question Answering (VQA)
— Unverified 0Efficient Quantum Gradient and Higher-order Derivative Estimation via Generalized Hadamard Test Aug 10, 2024 Visual Question Answering (VQA)
— Unverified 0Revisiting Multi-Modal LLM Evaluation Aug 9, 2024 Chart Understanding Optical Character Recognition
— Unverified 0Targeted Visual Prompting for Medical Visual Question Answering Aug 6, 2024 Medical Visual Question Answering Question Answering
Code Code Available 0LLaVA-OneVision: Easy Visual Task Transfer Aug 6, 2024 3D Question Answering (3D-QA)
Code Code Available 0Towards Flexible Evaluation for Generative Visual Question Answering Aug 1, 2024 Decoder Generative Visual Question Answering
Code Code Available 0Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model Jul 31, 2024 Benchmarking Large Language Model
Code Code Available 0Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering Jul 31, 2024 Diagnostic Hallucination
— Unverified 0SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving Jul 31, 2024 Autonomous Driving Language Modeling
— Unverified 0Bayesian Low-Rank LeArning (Bella): A Practical Approach to Bayesian Neural Networks Jul 30, 2024 Visual Question Answering (VQA)
Code Code Available 0Highly Efficient No-reference 4K Video Quality Assessment with Full-Pixel Covering Sampling and Training Strategy Jul 30, 2024 4k Video Quality Assessment
— Unverified 0