Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks Apr 14, 2025 Ethics Fairness
— Unverified 0AeroLite: Tag-Guided Lightweight Generation of Aerial Image Captions Apr 13, 2025 Image Captioning TAG
— Unverified 0Metropolis-Hastings Captioning Game: Knowledge Fusion of Vision Language Models via Decentralized Bayesian Inference Apr 13, 2025 Bayesian Inference Image Captioning
— Unverified 0Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions Apr 11, 2025 Contrastive Learning Image Captioning
— Unverified 0AstroLLaVA: towards the unification of astronomical data and natural language Apr 11, 2025 Astronomy Image Captioning
— Unverified 0Impact of Language Guidance: A Reproducibility Study Apr 10, 2025 Contrastive Learning Image Captioning
— Unverified 0How Can Objects Help Video-Language Understanding? Apr 10, 2025 Image Captioning Object
— Unverified 0RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model Apr 7, 2025 Image Captioning image-classification
— Unverified 0MORAL: A Multimodal Reinforcement Learning Framework for Decision Making in Autonomous Laboratories Apr 4, 2025 Decision Making Image Captioning
— Unverified 0Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention Apr 3, 2025 Caption Generation Contrastive Learning
— Unverified 0A Conformal Risk Control Framework for Granular Word Assessment and Uncertainty Calibration of CLIPScore Quality Estimates Apr 1, 2025 Image Captioning
— Unverified 0Context-Independent OCR with Multimodal LLMs: Effects of Image Resolution and Visual Complexity Mar 31, 2025 Image Captioning Optical Character Recognition
— Unverified 0Semantic-Spatial Feature Fusion with Dynamic Graph Refinement for Remote Sensing Image Captioning Mar 30, 2025 Graph Attention Image Captioning
— Unverified 0JEEM: Vision-Language Understanding in Four Arabic Dialects Mar 27, 2025 Image Captioning Question Answering
— Unverified 0Mitigating Low-Level Visual Hallucinations Requires Self-Awareness: Database, Model and Training Strategy Mar 26, 2025 Hallucination Image Captioning
— Unverified 0Improved Alignment of Modalities in Large Vision Language Models Mar 25, 2025 GPU Image Captioning
— Unverified 0Reverse Prompt: Cracking the Recipe Inside Text-to-Image Generation Mar 25, 2025 Image Captioning Image Generation
— Unverified 0UniCrossAdapter: Multimodal Adaptation of CLIP for Radiology Report Generation Mar 20, 2025 Image Captioning Transfer Learning
Code Code Available 0Natural Language Generation Mar 20, 2025 Image Captioning Image to text
— Unverified 0Disentangling Fine-Tuning from Pre-Training in Visual Captioning with Hybrid Markov Logic Mar 18, 2025 General Knowledge Image Captioning
Code Code Available 0Unified Autoregressive Visual Generation and Understanding with Continuous Tokens Mar 17, 2025 Image Captioning Image Generation
— Unverified 0GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing Mar 16, 2025 Change Detection Image Captioning
— Unverified 0CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era Mar 16, 2025 Benchmarking Image Captioning
— Unverified 0RONA: Pragmatically Diverse Image Captioning with Coherence Relations Mar 14, 2025 Diversity Image Captioning
Code Code Available 0Taxonomic Reasoning for Rare Arthropods: Combining Dense Image Captioning and RAG for Interpretable Classification Mar 13, 2025 Image Captioning RAG
— Unverified 0Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models Mar 12, 2025 Cross-Lingual Transfer Image Captioning
— Unverified 0Astrea: A MOE-based Visual Understanding Model with Progressive Alignment Mar 12, 2025 Contrastive Learning Cross-Modal Retrieval
— Unverified 0ComicsPAP: understanding comic strips by picking the correct panel Mar 11, 2025 Image Captioning Visual Question Answering (VQA)
— Unverified 0Measuring directional bias amplification in image captions using predictability Mar 10, 2025 Image Captioning image-classification
— Unverified 0Improving cognitive diagnostics in pathology: a deep learning approach for augmenting perceptional understanding of histopathology images Mar 10, 2025 Diagnostic Image Captioning
— Unverified 0PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training Mar 9, 2025 Hallucination Image Captioning
— Unverified 0From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models Mar 8, 2025 Image Captioning Language Modeling
— Unverified 0Treble Counterfactual VLMs: A Causal Approach to Hallucination Mar 8, 2025 Autonomous Driving counterfactual
Code Code Available 0A Benchmark for Multi-Lingual Vision-Language Learning in Remote Sensing Image Captioning Mar 6, 2025 Descriptive Image Captioning
Code Code Available 0Group Relative Policy Optimization for Image Captioning Mar 3, 2025 Diversity Image Captioning
Code Code Available 0AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language Mar 3, 2025 Decoder Image Captioning
— Unverified 0Exploring Causes and Mitigation of Hallucinations in Large Vision Language Models Feb 24, 2025 Hallucination Image Captioning
— Unverified 0Are Large Language Models Good Data Preprocessors? Feb 24, 2025 Image Captioning
— Unverified 0Fine-Grained Video Captioning through Scene Graph Consolidation Feb 23, 2025 Caption Generation Image Captioning
— Unverified 0Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning Feb 22, 2025 Decoder Image Captioning
— Unverified 0ReVision: A Dataset and Baseline VLM for Privacy-Preserving Task-Oriented Visual Instruction Rewriting Feb 20, 2025 Image Captioning multimodal interaction
— Unverified 0A Chain-of-Thought Subspace Meta-Learning for Few-shot Image Captioning with Large Vision and Language Models Feb 19, 2025 Image Captioning Language Modeling
— Unverified 0InsightVision: A Comprehensive, Multi-Level Chinese-based Benchmark for Evaluating Implicit Visual Semantics in Large Vision Language Models Feb 19, 2025 Image Captioning
— Unverified 0GroundCap: A Visually Grounded Image Captioning Dataset Feb 19, 2025 Image Captioning Object Detection
— Unverified 0Pretrained Image-Text Models are Secretly Video Captioners Feb 19, 2025 Image Captioning Video Captioning
Code Code Available 0What Is a Good Caption? A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness Feb 19, 2025 Image Captioning Keyword Extraction
— Unverified 0TPCap: Unlocking Zero-Shot Image Captioning with Trigger-Augmented and Multi-Modal Purification Modules Feb 16, 2025 GPU Image Captioning
— Unverified 0VisCon-100K: Leveraging Contextual Web Data for Fine-tuning Vision Language Models Feb 14, 2025 Image Captioning Large Language Model
— Unverified 0FE-LWS: Refined Image-Text Representations via Decoder Stacking and Fused Encodings for Remote Sensing Image Captioning Feb 13, 2025 Caption Generation Decoder
— Unverified 0Vision-Language Models for Edge Networks: A Comprehensive Survey Feb 11, 2025 Autonomous Vehicles Image Captioning
— Unverified 0