How Far Can Off-the-Shelf Multimodal Large Language Models Go in Online Episodic Memory Question Answering? Jun 19, 2025 Multiple-choice Question Answering
— Unverified 0Can Common VLMs Rival Medical VLMs? Evaluation and Strategic Insights Jun 19, 2025 Question Answering Visual Question Answering
— Unverified 0MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering Jun 18, 2025 Multimodal Reasoning Question Answering
— Unverified 0Adapting Lightweight Vision Language Models for Radiological Visual Question Answering Jun 17, 2025 Diagnostic Question Answering
Code Code Available 0ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM Jun 17, 2025 Hallucination Language Modeling
— Unverified 0Connecting phases of matter to the flatness of the loss landscape in analog variational quantum algorithms Jun 16, 2025 Visual Question Answering (VQA)
— Unverified 0CAPO: Reinforcing Consistent Reasoning in Medical Decision-Making Jun 15, 2025 Answer Generation Decision Making
— Unverified 0EyeSim-VQA: A Free-Energy-Guided Eye Simulation Framework for Video Quality Assessment Jun 13, 2025 Image Quality Assessment Video Quality Assessment
— Unverified 0SlotPi: Physics-informed Object-centric Reasoning Models Jun 12, 2025 Object Question Answering
Code Code Available 0Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning Jun 12, 2025 Attribute Multimodal Reasoning
— Unverified 0HalLoc: Token-level Localization of Hallucinations for Vision Language Models Jun 12, 2025 Hallucination Image Captioning
Code Code Available 0Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos Jun 11, 2025 Question Answering Visual Question Answering
Code Code Available 0Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy Jun 11, 2025 Medical Visual Question Answering Question Answering
Code Code Available 0Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning Jun 11, 2025 In-Context Learning Question Answering
— Unverified 0PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly Jun 10, 2025 Question Answering Scene Understanding
— Unverified 0From Pixels to Graphs: using Scene and Knowledge Graphs for HD-EPIC VQA Challenge Jun 10, 2025 Knowledge Graphs Language Modeling
— Unverified 0Looking Beyond Visible Cues: Implicit Video Question Answering via Dual-Clue Reasoning Jun 9, 2025 Future prediction Question Answering
Code Code Available 0HAIBU-ReMUD: Reasoning Multimodal Ultrasound Dataset and Model Bridging to General Specific Domains Jun 9, 2025 Diagnostic Question Answering
Code Code Available 0Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning Jun 8, 2025 Medical Report Generation Question Answering
— Unverified 0Ontology-based knowledge representation for bone disease diagnosis: a foundation for safe and sustainable medical artificial intelligence systems Jun 5, 2025 Diagnostic Multimodal Deep Learning
— Unverified 0ReXVQA: A Large-scale Visual Question Answering Benchmark for Generalist Chest X-ray Understanding Jun 4, 2025 Negation Negation Detection
— Unverified 0CoRe-MMRAG: Cross-Source Knowledge Reconciliation for Multimodal RAG Jun 3, 2025 Answer Generation RAG
— Unverified 0Fast or Slow? Integrating Fast Intuition and Deliberate Thinking for Enhancing Visual Question Answering Jun 1, 2025 All MME
— Unverified 0MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning May 31, 2025 Diagnostic Reinforcement Learning (RL)
— Unverified 0Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck May 30, 2025 Question Answering Visual Question Answering
— Unverified 0Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting May 30, 2025 image-classification Image Classification
— Unverified 0Multi-Sourced Compositional Generalization in Visual Question Answering May 29, 2025 Question Answering Visual Question Answering
Code Code Available 0A Comprehensive Evaluation of Multi-Modal Large Language Models for Endoscopy Analysis May 29, 2025 Diagnostic Visual Prompting
— Unverified 0Spoken question answering for visual queries May 29, 2025 Question Answering Visual Question Answering (VQA)
— Unverified 0Synthetic Document Question Answering in Hungarian May 29, 2025 Optical Character Recognition (OCR) Question Answering
Code Code Available 0MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence May 29, 2025 Multiple-choice Spatial Reasoning
— Unverified 0NegVQA: Can Vision Language Models Understand Negation? May 28, 2025 Negation Question Answering
— Unverified 0VIGNETTE: Socially Grounded Bias Evaluation for Vision-Language Models May 28, 2025 Decision Making Question Answering
Code Code Available 0FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering May 27, 2025 Benchmarking Question Answering
Code Code Available 0Silence is Not Consensus: Disrupting Agreement Bias in Multi-Agent LLMs via Catfish Agent for Clinical Decision Making May 27, 2025 Decision Making Diagnostic
— Unverified 0Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models May 26, 2025 image-classification Image Classification
Code Code Available 0Benchmarking Large Multimodal Models for Ophthalmic Visual Question Answering with OphthalWeChat May 26, 2025 Benchmarking Question Answering
— Unverified 0MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models May 26, 2025 Image Generation Visual Question Answering (VQA)
— Unverified 0TDVE-Assessor: Benchmarking and Evaluating the Quality of Text-Driven Video Editing with LMMs May 26, 2025 Benchmarking Large Language Model
— Unverified 0Medical Large Vision Language Models with Multi-Image Visual Ability May 25, 2025 Question Answering Visual Question Answering (VQA)
Code Code Available 0GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance May 25, 2025 Caption Generation Question Answering
— Unverified 0NTIRE 2025 Challenge on Video Quality Enhancement for Video Conferencing: Datasets, Methods and Results May 25, 2025 valid Video Quality Assessment
Code Code Available 0Improving Medical Reasoning with Curriculum-Aware Reinforcement Learning May 25, 2025 Out-of-Distribution Generalization reinforcement-learning
— Unverified 0Focus on What Matters: Enhancing Medical Vision-Language Models with Automatic Attention Alignment Tuning May 24, 2025 Visual Question Answering (VQA)
— Unverified 0CT-Agent: A Multimodal-LLM Agent for 3D CT Radiology Question Answering May 22, 2025 Computed Tomography (CT) Question Answering
— Unverified 0A Causal Approach to Mitigate Modality Preference Bias in Medical Visual Question Answering May 22, 2025 counterfactual Medical Visual Question Answering
— Unverified 0Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports May 22, 2025 Answer Generation Question Answering
— Unverified 0Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation May 22, 2025 Hallucination Image Captioning
— Unverified 0MedFrameQA: A Multi-Image Medical VQA Benchmark for Clinical Reasoning May 22, 2025 Diagnostic Visual Question Answering (VQA)
— Unverified 0Zero-Shot Anomaly Detection in Battery Thermal Images Using Visual Question Answering with Prior Knowledge May 22, 2025 Anomaly Detection Question Answering
— Unverified 0