SNAP: A Benchmark for Testing the Effects of Capture Conditions on Fundamental Vision Tasks May 21, 2025 image-classification Image Classification
Code Code Available 0CP-LLM: Context and Pixel Aware Large Language Model for Video Quality Assessment May 21, 2025 Language Modeling Language Modelling
— Unverified 0Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning May 21, 2025 All Visual Question Answering (VQA)
— Unverified 0Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets May 21, 2025 Dataset Generation Descriptive
— Unverified 0Visual Question Answering on Multiple Remote Sensing Image Modalities May 21, 2025 Question Answering Visual Question Answering
— Unverified 0TinyDrive: Multiscale Visual Question Answering with Selective Token Routing for Autonomous Driving May 21, 2025 Autonomous Driving Question Answering
— Unverified 0PlanGPT-VL: Enhancing Urban Planning with Domain-Specific Vision-Language Models May 20, 2025 Visual Question Answering (VQA)
— Unverified 0Debating for Better Reasoning: An Unsupervised Multimodal Approach May 20, 2025 Question Answering Visual Question Answering
— Unverified 0Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models May 20, 2025 Medical Visual Question Answering Question Answering
— Unverified 0TinyRS-R1: Compact Multimodal Language Model for Remote Sensing May 17, 2025 Language Modeling Language Modelling
— Unverified 0RVTBench: A Benchmark for Visual Reasoning Tasks May 17, 2025 Reasoning Segmentation Visual Question Answering (VQA)
Code Code Available 0MedSG-Bench: A Benchmark for Medical Image Sequences Grounding May 17, 2025 Visual Grounding Visual Question Answering (VQA)
— Unverified 0HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation May 16, 2025 Benchmarking Ethics
Code Code Available 0Semantically-Aware Game Image Quality Assessment May 16, 2025 Feature Importance Image Quality Assessment
— Unverified 0TCC-Bench: Benchmarking the Traditional Chinese Culture Understanding Capabilities of MLLMs May 16, 2025 Benchmarking Question Answering
Code Code Available 0Enhancing Multi-Image Question Answering via Submodular Subset Selection May 15, 2025 Question Answering Retrieval
— Unverified 0Variational Visual Question Answering May 14, 2025 Question Answering Visual Question Answering
— Unverified 0OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval May 10, 2025 Cross-Modal Retrieval Question Answering
— Unverified 0Natural Reflection Backdoor Attack on Vision Language Model for Autonomous Driving May 9, 2025 Autonomous Driving Backdoor Attack
— Unverified 0R^3-VQA: "Read the Room" by Video Social Reasoning May 7, 2025 State Estimation Visual Question Answering (VQA)
— Unverified 0DiffVQA: Video Quality Assessment Using Diffusion Feature Extractor May 6, 2025 Mamba Video Quality Assessment
— Unverified 0Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision May 6, 2025 Learning-To-Rank Self-Supervised Learning
Code Code Available 0Task-Oriented Semantic Communication in Large Multimodal Models-based Vehicle Networks May 5, 2025 Question Answering Semantic Communication
— Unverified 0AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation May 5, 2025 Anatomy Diagnostic
— Unverified 0AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care May 1, 2025 Language Modeling Language Modelling
Code Code Available 0SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models May 1, 2025 Spatial Reasoning Visual Question Answering (VQA)
— Unverified 0Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation May 1, 2025 Question Answering Specificity
Code Code Available 0Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs Apr 30, 2025 Hallucination Hallucination Evaluation
— Unverified 0Data-Driven Calibration of Prediction Sets in Large Vision-Language Models Based on Inductive Conformal Prediction Apr 24, 2025 Conformal Prediction Hallucination
— Unverified 0A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task Apr 24, 2025 Question Answering Retrieval
— Unverified 0Instruction-augmented Multimodal Alignment for Image-Text and Element Matching Apr 16, 2025 Image Augmentation Image Generation
— Unverified 0Bridging the Semantic Gaps: Improving Medical VQA Consistency with LLM-Augmented Question Sets Apr 16, 2025 Diversity Medical Visual Question Answering
— Unverified 0DVLTA-VQA: Decoupled Vision-Language Modeling with Text-Guided Adaptation for Blind Video Quality Assessment Apr 16, 2025 Language Modeling Language Modelling
— Unverified 0PuzzleBench: A Fully Dynamic Evaluation Framework for Large Multimodal Models on Puzzle Solving Apr 15, 2025 Logical Reasoning Visual Question Answering (VQA)
— Unverified 0QAVA: Query-Agnostic Visual Attack to Large Vision-Language Models Apr 15, 2025 Question Answering Visual Question Answering
Code Code Available 0MMKB-RAG: A Multi-Modal Knowledge-Based Retrieval-Augmented Generation Framework Apr 14, 2025 Question Answering RAG
— Unverified 0Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks Apr 14, 2025 Ethics Fairness
— Unverified 0NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding Apr 12, 2025 Benchmarking Document AI
— Unverified 0FVQ: A Large-Scale Dataset and A LMM-based Method for Face Video Quality Assessment Apr 12, 2025 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks Apr 12, 2025 Computed Tomography (CT) Question Answering
— Unverified 0TokenFocus-VQA: Enhancing Text-to-Image Alignment with Position-Aware Focus and Multi-Perspective Aggregations on LVLMs Apr 10, 2025 Ensemble Learning Position
— Unverified 0UniRVQA: A Unified Framework for Retrieval-Augmented Vision Question Answering via Self-Reflective Joint Training Apr 5, 2025 Articles Question Answering
— Unverified 0QIRL: Boosting Visual Question Answering via Optimized Question-Image Relation Learning Apr 4, 2025 Data Augmentation Image Generation
— Unverified 0Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion Apr 4, 2025 Diagnostic Medical Visual Question Answering
— Unverified 0SocialGesture: Delving into Multi-person Gesture Understanding Apr 3, 2025 Gesture Recognition Question Answering
— Unverified 0Reasoning LLMs for User-Aware Multimodal Conversational Agents Apr 2, 2025 RAG Retrieval-augmented Generation
— Unverified 0MPDrive: Improving Spatial Understanding with Marker-Based Prompt Learning for Autonomous Driving Apr 1, 2025 Autonomous Driving Prompt Learning
— Unverified 0KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language Mar 31, 2025 Form Question Answering
Code Code Available 0How Well Can Vison-Language Models Understand Humans' Intention? An Open-ended Theory of Mind Question Evaluation Benchmark Mar 28, 2025 Question Answering Visual Question Answering
— Unverified 0Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields Mar 26, 2025 Question Answering Visual Question Answering
— Unverified 0