@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology Sep 21, 2024 Benchmarking Depth Estimation
— Unverified 0JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images Sep 19, 2024 Hallucination Image Captioning
Code Code Available 0Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering Sep 19, 2024 Hallucination Hallucination Evaluation
Code Code Available 1Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Sep 18, 2024 Natural Language Visual Grounding
Code Code Available 11Sparks of Artificial General Intelligence(AGI) in Semiconductor Material Science: Early Explorations into the Next Frontier of Generative AI-Assisted Electron Micrograph Analysis Sep 17, 2024 In-Context Learning Question Answering
— Unverified 0Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs Sep 17, 2024 Question Answering Token Reduction
Code Code Available 1CAST: Cross-modal Alignment Similarity Test for Vision Language Models Sep 17, 2024 cross-modal alignment Question Answering
Code Code Available 0Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types Sep 14, 2024 Language Modeling Language Modelling
Code Code Available 0QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems Sep 14, 2024 Question Answering Video Question Answering
— Unverified 0Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering Sep 11, 2024 Question Answering Visual Question Answering
— Unverified 0COLUMBUS: Evaluating COgnitive Lateral Understanding through Multiple-choice reBUSes Sep 6, 2024 Multiple-choice Question Answering
Code Code Available 0How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model? Sep 3, 2024 In-Context Learning Language Modeling
Code Code Available 0Kvasir-VQA: A Text-Image Pair GI Tract Dataset Sep 2, 2024 Image Captioning Image Generation
Code Code Available 0Look, Learn and Leverage (L^3): Mitigating Visual-Domain Shift and Discovering Intrinsic Relations via Symbolic Alignment Aug 30, 2024 Question Answering Representation Learning
— Unverified 0Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering Aug 30, 2024 Decoder Language Modeling
— Unverified 0M4CXR: Exploring Multi-task Potentials of Multi-modal Large Language Models for Chest X-ray Interpretation Aug 29, 2024 Instruction Following Medical Report Generation
— Unverified 0GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models Aug 29, 2024 Bias Detection Fairness
Code Code Available 0Can Visual Language Models Replace OCR-Based Visual Question Answering Pipelines in Production? A Case Study in Retail Aug 28, 2024 Optical Character Recognition Optical Character Recognition (OCR)
— Unverified 0Can SAR improve RSVQA performance? Aug 28, 2024 Question Answering Visual Question Answering
— Unverified 0Multi-Modal Instruction-Tuning Small-Scale Language-and-Vision Assistant for Semiconductor Electron Micrograph Analysis Aug 27, 2024 Instruction Following Question Answering
— Unverified 0Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis Aug 27, 2024 Benchmarking Large Language Model
— Unverified 0LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models Aug 26, 2024 Large Language Model Video Quality Assessment
Code Code Available 0Towards Human-Level Understanding of Complex Process Engineering Schematics: A Pedagogical, Introspective Multi-Agent Framework for Open-Domain Question Answering Aug 24, 2024 knowledge editing Open-Domain Question Answering
— Unverified 0Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption Aug 23, 2024 Instruction Following Knowledge Distillation
— Unverified 0Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese Aug 22, 2024 Language Modeling Language Modelling
— Unverified 0AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results Aug 21, 2024 Image Manipulation valid
Code Code Available 1Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework Aug 21, 2024 geo-localization Language Modeling
— Unverified 0VE-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment Aug 21, 2024 Video Alignment Video Editing
Code Code Available 2V-RoAst: Visual Road Assessment. Can VLM be a Road Safety Assessor Using the iRAP Standard? Aug 20, 2024 Few-Shot Learning In-Context Learning
Code Code Available 1FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant Aug 19, 2024 Descriptive Face Swapping
Code Code Available 1PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding Aug 18, 2024 Language Modelling Question Answering
Code Code Available 2Beyond the Hype: A dispassionate look at vision-language models in medical scenario Aug 16, 2024 Question Answering Spatial Reasoning
— Unverified 0Visual Agents as Fast and Slow Thinkers Aug 16, 2024 Question Answering Reasoning Segmentation
Code Code Available 1Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm Aug 16, 2024 Decision Making Medical Visual Question Answering
Code Code Available 0Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion Aug 14, 2024 Question Answering Visual Question Answering
— Unverified 0Subjective and Objective Quality Assessment of Rendered Human Avatar Videos in Virtual Reality Aug 13, 2024 Video Compression Video Quality Assessment
— Unverified 0Long-Form Answers to Visual Questions from Blind and Low Vision People Aug 12, 2024 Form Visual Question Answering (VQA)
— Unverified 0SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning Aug 10, 2024 Hallucination Optical Character Recognition
Code Code Available 11Efficient Quantum Gradient and Higher-order Derivative Estimation via Generalized Hadamard Test Aug 10, 2024 Visual Question Answering (VQA)
— Unverified 0Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery Aug 9, 2024 Contrastive Learning Medical Visual Question Answering
Code Code Available 1mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Aug 9, 2024 Language Modeling Language Modelling
Code Code Available 7Revisiting Multi-Modal LLM Evaluation Aug 9, 2024 Chart Understanding Optical Character Recognition
— Unverified 0GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI Aug 6, 2024 Question Answering Visual Question Answering
Code Code Available 2LLaVA-OneVision: Easy Visual Task Transfer Aug 6, 2024 3D Question Answering (3D-QA)
Code Code Available 0Targeted Visual Prompting for Medical Visual Question Answering Aug 6, 2024 Medical Visual Question Answering Question Answering
Code Code Available 0MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine Aug 6, 2024 Medical Visual Question Answering Organ Detection
Code Code Available 3Towards Flexible Evaluation for Generative Visual Question Answering Aug 1, 2024 Decoder Generative Visual Question Answering
Code Code Available 0Benchmarking Multi-dimensional AIGC Video Quality Assessment: A Dataset and Unified Model Jul 31, 2024 Benchmarking Large Language Model
Code Code Available 0SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving Jul 31, 2024 Autonomous Driving Language Modeling
— Unverified 0Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering Jul 31, 2024 Diagnostic Hallucination
— Unverified 0