RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning Mar 29, 2025 Chart Question Answering Chart Understanding
Code Code Available 1Can DeepSeek Reason Like a Surgeon? An Empirical Evaluation for Vision-Language Understanding in Robotic-Assisted Surgery Mar 29, 2025 Action Understanding Instrument Recognition
— Unverified 0Patience is all you need! An agentic system for performing scientific literature review Mar 28, 2025 All Articles
— Unverified 0How Well Can Vison-Language Models Understand Humans' Intention? An Open-ended Theory of Mind Question Evaluation Benchmark Mar 28, 2025 Question Answering Visual Question Answering
— Unverified 0Preference-based Learning with Retrieval Augmented Generation for Conversational Question Answering Mar 28, 2025 Conversational Question Answering Question Answering
Code Code Available 0EgoToM: Benchmarking Theory of Mind Reasoning from Egocentric Videos Mar 28, 2025 Benchmarking Question Answering
Code Code Available 1AssistPDA: An Online Video Surveillance Assistant for Video Anomaly Prediction, Detection, and Analysis Mar 27, 2025 Anomaly Detection Anomaly Forecasting
— Unverified 0MemInsight: Autonomous Memory Augmentation for LLM Agents Mar 27, 2025 Conversational Recommendation Language Modeling
— Unverified 0JEEM: Vision-Language Understanding in Four Arabic Dialects Mar 27, 2025 Image Captioning Question Answering
— Unverified 0CTRL-O: Language-Controllable Object-Centric Visual Representation Learning Mar 27, 2025 Image Generation Object
— Unverified 0ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation Mar 27, 2025 Question Answering RAG
Code Code Available 1SWI: Speaking with Intent in Large Language Models Mar 27, 2025 Mathematical Reasoning Question Answering
Code Code Available 0Leveraging LLMs with Iterative Loop Structure for Enhanced Social Intelligence in Video Question Answering Mar 27, 2025 Emotion Recognition Question Answering
— Unverified 0AskSport: Web Application for Sports Question-Answering Mar 27, 2025 Question Answering
— Unverified 0Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving Mar 27, 2025 Attribute Autonomous Driving
Code Code Available 1FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs Mar 27, 2025 Attribute Benchmarking
Code Code Available 1A Survey of Multimodal Retrieval-Augmented Generation Mar 26, 2025 Information Retrieval Question Answering
— Unverified 0Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields Mar 26, 2025 Question Answering Visual Question Answering
— Unverified 0Vision-Amplified Semantic Entropy for Hallucination Detection in Medical Visual Question Answering Mar 26, 2025 Diagnostic Hallucination
— Unverified 0Mitigating Low-Level Visual Hallucinations Requires Self-Awareness: Database, Model and Training Strategy Mar 26, 2025 Hallucination Image Captioning
— Unverified 0Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs Mar 26, 2025 Hallucination Hallucination Evaluation
— Unverified 0Unified Multimodal Discrete Diffusion Mar 26, 2025 Image Captioning Image Generation
Code Code Available 2Self-ReS: Self-Reflection in Large Vision-Language Models for Long Video Understanding Mar 26, 2025 GPU Question Answering
— Unverified 0KSHSeek: Data-Driven Approaches to Mitigating and Detecting Knowledge-Shortcut Hallucinations in Generative Models Mar 25, 2025 Hallucination Question Answering
— Unverified 0VectorFit : Adaptive Singular & Bias Vector Fine-Tuning of Pre-trained Foundation Models Mar 25, 2025 image-classification Image Classification
— Unverified 0Context-Efficient Retrieval with Factual Decomposition Mar 25, 2025 Form Information Retrieval
— Unverified 0DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts Mar 25, 2025 Astronomy Chart Question Answering
— Unverified 0ImF: Implicit Fingerprint for Large Language Models Mar 25, 2025 Adversarial Attack Question Answering
— Unverified 0Improved Alignment of Modalities in Large Vision Language Models Mar 25, 2025 GPU Image Captioning
— Unverified 0Can Vision-Language Models Answer Face to Face Questions in the Real-World? Mar 25, 2025 Question Answering
— Unverified 0ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation Mar 25, 2025 Action Generation Autonomous Driving
— Unverified 0Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models Mar 25, 2025 Benchmarking Image Captioning
Code Code Available 1VGAT: A Cancer Survival Analysis Framework Transitioning from Generative Visual Question Answering to Genomic Reconstruction Mar 25, 2025 Generative Visual Question Answering Question Answering
Code Code Available 0BiblioPage: A Dataset of Scanned Title Pages for Bibliographic Metadata Extraction Mar 25, 2025 document understanding object-detection
Code Code Available 0LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? Mar 25, 2025 Autonomous Navigation Question Answering
— Unverified 0PAVE: Patching and Adapting Video Large Language Models Mar 25, 2025 Audio-visual Question Answering Multi-Task Learning
Code Code Available 1Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis Mar 25, 2025 Contrastive Learning Image-text Retrieval
Code Code Available 2DeCAP: Context-Adaptive Prompt Generation for Debiasing Zero-shot Question Answering in Large Language Models Mar 25, 2025 Fairness Question Answering
— Unverified 0Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces Mar 24, 2025 Question Answering
— Unverified 0Where is this coming from? Making groundedness count in the evaluation of Document VQA models Mar 24, 2025 Question Answering Visual Question Answering
— Unverified 0A Survey of Large Language Model Agents for Question Answering Mar 24, 2025 Answer Generation Information Retrieval
— Unverified 0When is dataset cartography ineffective? Using training dynamics does not improve robustness against Adversarial SQuAD Mar 24, 2025 Adversarial Robustness Extractive Question-Answering
— Unverified 0MC-LLaVA: Multi-Concept Personalized Vision-Language Model Mar 24, 2025 Language Modeling Language Modelling
Code Code Available 2MAGIC-VQA: Multimodal And Grounded Inference with Commonsense Knowledge for Visual Question Answering Mar 24, 2025 Graph Neural Network Question Answering
— Unverified 0DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels Mar 24, 2025 Medical Visual Question Answering Question Answering
— Unverified 0Synthetic Function Demonstrations Improve Generation in Low-Resource Programming Languages Mar 24, 2025 Question Answering RAG
— Unverified 0LLaVAction: evaluating and training multi-modal large language models for action recognition Mar 24, 2025 Action Recognition Action Understanding
Code Code Available 2Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models Mar 23, 2025 Question Answering Visual Question Answering
— Unverified 0SLIDE: Sliding Localized Information for Document Extraction Mar 23, 2025 Chunking graph construction
— Unverified 0Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook Mar 23, 2025 3D Generation Medical Report Generation
Code Code Available 3