Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering Mar 26, 2025 Diagnostic Hallucination
— Unverified 0ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation Mar 25, 2025 Action Generation Autonomous Driving
— Unverified 0LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Mar 25, 2025 Autonomous Navigation Question Answering
— Unverified 0VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction Mar 25, 2025 Generative Visual Question Answering Question Answering
Code Code Available 0DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels Mar 24, 2025 Medical Visual Question Answering Question Answering
— Unverified 0MAGIC-VQA: Multimodal And Grounded Inference with Commonsense Knowledge for Visual Question Answering Mar 24, 2025 Graph Neural Network Question Answering
— Unverified 0Where is this coming from? Making groundedness count in the evaluation of Document VQA models Mar 24, 2025 Question Answering Visual Question Answering
— Unverified 0Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models Mar 23, 2025 Question Answering Visual Question Answering
— Unverified 0Progressive Prompt Detailing for Improved Alignment in Text-to-Image Generative Models Mar 22, 2025 Question Answering Visual Question Answering
Code Code Available 0A Vision Centric Remote Sensing Benchmark Mar 20, 2025 Question Answering Representation Learning
— Unverified 0TruthLens:A Training-Free Paradigm for DeepFake Detection Mar 19, 2025 Binary Classification DeepFake Detection
— Unverified 0UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation Mar 19, 2025 Language Model Evaluation Language Modeling
— Unverified 0ChatBEV: A Visual Language Model that Understands BEV Maps Mar 18, 2025 Autonomous Driving Language Modeling
— Unverified 0Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding Mar 18, 2025 document understanding Question Answering
Code Code Available 0GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing Mar 16, 2025 Change Detection Image Captioning
— Unverified 0T2I-FineEval: Fine-Grained Compositional Metric for Text-to-Image Evaluation Mar 14, 2025 Attribute Question Answering
Code Code Available 0DynRsl-VLM: Enhancing Autonomous Driving Perception with Dynamic Resolution Vision-Language Models Mar 14, 2025 Autonomous Driving Computational Efficiency
— Unverified 0Astrea: A MOE-based Visual Understanding Model with Progressive Alignment Mar 12, 2025 Contrastive Learning Cross-Modal Retrieval
— Unverified 0SurgicalVLM-Agent: Towards an Interactive AI Co-Pilot for Pituitary Surgery Mar 12, 2025 Activity Recognition Anatomy
— Unverified 0ComicsPAP: understanding comic strips by picking the correct panel Mar 11, 2025 Image Captioning Visual Question Answering (VQA)
— Unverified 0Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework Mar 11, 2025 Conformal Prediction Multimodal Reasoning
— Unverified 0Bring Remote Sensing Object Detect Into Nature Language Model: Using SFT Method Mar 11, 2025 Language Modeling Language Modelling
— Unverified 0Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru Mar 10, 2025 Autonomous Driving Question Answering
— Unverified 0CalliReader: Contextualizing Chinese Calligraphy via an Embedding-Aligned Vision-Language Model Mar 9, 2025 Hallucination Language Modeling
— Unverified 0SplatTalk: 3D VQA with Gaussian Splatting Mar 8, 2025 3DGS Question Answering
— Unverified 0MoEMoE: Question Guided Dense and Scalable Sparse Mixture-of-Expert for Multi-source Multi-modal Answering Mar 8, 2025 Answer Generation Mixture-of-Experts
— Unverified 0Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models Mar 8, 2025 Caption Generation Question Answering
— Unverified 0Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations Mar 5, 2025 Question Answering Visual Question Answering
Code Code Available 0BioD2C: A Dual-level Semantic Consistency Constraint Framework for Biomedical VQA Mar 4, 2025 Medical Diagnosis Question Answering
Code Code Available 0A Token-level Text Image Foundation Model for Document Understanding Mar 4, 2025 document understanding Visual Question Answering (VQA)
— Unverified 0Watch Out Your Album! On the Inadvertent Privacy Memorization in Multi-Modal Large Language Models Mar 3, 2025 Memorization Question Answering
Code Code Available 0V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts Mar 3, 2025 Contrastive Learning Text Retrieval
— Unverified 0Enhancing Multi-hop Reasoning in Vision-Language Models via Self-Distillation with Multi-Prompt Ensembling Mar 3, 2025 Answer Generation Computational Efficiency
— Unverified 0FunBench: Benchmarking Fundus Reading Skills of MLLMs Mar 2, 2025 Anatomy Benchmarking
— Unverified 0CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering Mar 1, 2025 Continual Learning Language Modeling
— Unverified 0ABC: Achieving Better Control of Multimodal Embeddings using VLMs Mar 1, 2025 Image to text Image-to-Text Retrieval
— Unverified 0MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language Models Feb 28, 2025 Decision Making Hallucination
Code Code Available 0Fine-Grained Retrieval-Augmented Generation for Visual Question Answering Feb 28, 2025 Question Answering RAG
— Unverified 0Adaptive Score Alignment Learning for Continual Perceptual Quality Assessment of 360-Degree Videos in Virtual Reality Feb 27, 2025 Video Quality Assessment Visual Question Answering (VQA)
Code Code Available 0ChatReID: Open-ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models Feb 27, 2025 Person Re-Identification Person Retrieval
— Unverified 0Talking to the brain: Using Large Language Models as Proxies to Model Brain Semantic Representation Feb 26, 2025 Question Answering valid
— Unverified 0FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA Feb 25, 2025 Question Answering Retrieval
— Unverified 0Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines Feb 23, 2025 Answer Generation Language Modeling
— Unverified 0Directional Gradient Projection for Robust Fine-Tuning of Foundation Models Feb 21, 2025 image-classification Image Classification
— Unverified 0Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling Feb 20, 2025 Decoder GPU
Code Code Available 0Exploring Advanced Techniques for Visual Question Answering: A Comprehensive Comparison Feb 20, 2025 Diversity Language Modeling
— Unverified 0Hardware-Friendly Static Quantization Method for Video Diffusion Transformers Feb 20, 2025 Quantization Video Generation
— Unverified 0Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning Feb 19, 2025 Autonomous Driving Bench2Drive
— Unverified 0PitVQA++: Vector Matrix-Low-Rank Adaptation for Open-Ended Visual Question Answering in Pituitary Surgery Feb 19, 2025 Question Answering Visual Question Answering
Code Code Available 0SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning Feb 18, 2025 Machine Unlearning Visual Question Answering (VQA)
— Unverified 0