SilVar-Med: A Speech-Driven Visual Language Model for Explainable Abnormality Detection in Medical Imaging Apr 14, 2025 Anomaly Detection Diagnostic
Code Code Available 1A Survey on Efficient Vision-Language Models Apr 13, 2025 Image Captioning Question Answering
Code Code Available 1AeroLite: Tag-Guided Lightweight Generation of Aerial Image Captions Apr 13, 2025 Image Captioning TAG
— Unverified 0Metropolis-Hastings Captioning Game: Knowledge Fusion of Vision Language Models via Decentralized Bayesian Inference Apr 13, 2025 Bayesian Inference Image Captioning
— Unverified 0AstroLLaVA: towards the unification of astronomical data and natural language Apr 11, 2025 Astronomy Image Captioning
— Unverified 0Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions Apr 11, 2025 Contrastive Learning Image Captioning
— Unverified 0Impact of Language Guidance: A Reproducibility Study Apr 10, 2025 Contrastive Learning Image Captioning
— Unverified 0How Can Objects Help Video-Language Understanding? Apr 10, 2025 Image Captioning Object
— Unverified 0OmniCaptioner: One Captioner to Rule Them All Apr 9, 2025 All Image Captioning
Code Code Available 2RS-RAG: Bridging Remote Sensing Imagery and Comprehensive Knowledge with a Multi-Modal Dataset and Retrieval-Augmented Generation Model Apr 7, 2025 Image Captioning image-classification
— Unverified 0MORAL: A Multimodal Reinforcement Learning Framework for Decision Making in Autonomous Laboratories Apr 4, 2025 Decision Making Image Captioning
— Unverified 0Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention Apr 3, 2025 Caption Generation Contrastive Learning
— Unverified 0A Conformal Risk Control Framework for Granular Word Assessment and Uncertainty Calibration of CLIPScore Quality Estimates Apr 1, 2025 Image Captioning
— Unverified 0Context-Independent OCR with Multimodal LLMs: Effects of Image Resolution and Visual Complexity Mar 31, 2025 Image Captioning Optical Character Recognition
— Unverified 0Semantic-Spatial Feature Fusion with Dynamic Graph Refinement for Remote Sensing Image Captioning Mar 30, 2025 Graph Attention Image Captioning
— Unverified 0JEEM: Vision-Language Understanding in Four Arabic Dialects Mar 27, 2025 Image Captioning Question Answering
— Unverified 0Unified Multimodal Discrete Diffusion Mar 26, 2025 Image Captioning Image Generation
Code Code Available 2Mitigating Low-Level Visual Hallucinations Requires Self-Awareness: Database, Model and Training Strategy Mar 26, 2025 Hallucination Image Captioning
— Unverified 0Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models Mar 25, 2025 Benchmarking Image Captioning
Code Code Available 1Improved Alignment of Modalities in Large Vision Language Models Mar 25, 2025 GPU Image Captioning
— Unverified 0Reverse Prompt: Cracking the Recipe Inside Text-to-Image Generation Mar 25, 2025 Image Captioning Image Generation
— Unverified 0Natural Language Generation Mar 20, 2025 Image Captioning Image to text
— Unverified 0UniCrossAdapter: Multimodal Adaptation of CLIP for Radiology Report Generation Mar 20, 2025 Image Captioning Transfer Learning
Code Code Available 0Image Captioning Evaluation in the Age of Multimodal LLMs: Challenges and Future Perspectives Mar 18, 2025 Image Captioning
Code Code Available 1Disentangling Fine-Tuning from Pre-Training in Visual Captioning with Hybrid Markov Logic Mar 18, 2025 General Knowledge Image Captioning
Code Code Available 0Unified Autoregressive Visual Generation and Understanding with Continuous Tokens Mar 17, 2025 Image Captioning Image Generation
— Unverified 0CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era Mar 16, 2025 Benchmarking Image Captioning
— Unverified 0GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing Mar 16, 2025 Change Detection Image Captioning
— Unverified 0Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition Mar 16, 2025 Caption Generation Image Captioning
Code Code Available 1Falcon: A Remote Sensing Vision-Language Foundation Model Mar 14, 2025 Image Captioning image-classification
Code Code Available 3RONA: Pragmatically Diverse Image Captioning with Coherence Relations Mar 14, 2025 Diversity Image Captioning
Code Code Available 0Taxonomic Reasoning for Rare Arthropods: Combining Dense Image Captioning and RAG for Interpretable Classification Mar 13, 2025 Image Captioning RAG
— Unverified 0Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models Mar 12, 2025 Cross-Lingual Transfer Image Captioning
— Unverified 0Astrea: A MOE-based Visual Understanding Model with Progressive Alignment Mar 12, 2025 Contrastive Learning Cross-Modal Retrieval
— Unverified 0ComicsPAP: understanding comic strips by picking the correct panel Mar 11, 2025 Image Captioning Visual Question Answering (VQA)
— Unverified 0Measuring directional bias amplification in image captions using predictability Mar 10, 2025 Image Captioning image-classification
— Unverified 0Improving cognitive diagnostics in pathology: a deep learning approach for augmenting perceptional understanding of histopathology images Mar 10, 2025 Diagnostic Image Captioning
— Unverified 0PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training Mar 9, 2025 Hallucination Image Captioning
— Unverified 0From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models Mar 8, 2025 Image Captioning Language Modeling
— Unverified 0Treble Counterfactual VLMs: A Causal Approach to Hallucination Mar 8, 2025 Autonomous Driving counterfactual
Code Code Available 0Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model Mar 6, 2025 General Knowledge Image Captioning
Code Code Available 2A Benchmark for Multi-Lingual Vision-Language Learning in Remote Sensing Image Captioning Mar 6, 2025 Descriptive Image Captioning
Code Code Available 0AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language Mar 3, 2025 Decoder Image Captioning
— Unverified 0Group Relative Policy Optimization for Image Captioning Mar 3, 2025 Diversity Image Captioning
Code Code Available 0Exploring Causes and Mitigation of Hallucinations in Large Vision Language Models Feb 24, 2025 Hallucination Image Captioning
— Unverified 0Are Large Language Models Good Data Preprocessors? Feb 24, 2025 Image Captioning
— Unverified 0Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts Feb 24, 2025 Benchmarking Fact Verification
Code Code Available 2Fine-Grained Video Captioning through Scene Graph Consolidation Feb 23, 2025 Caption Generation Image Captioning
— Unverified 0Good Representation, Better Explanation: Role of Convolutional Neural Networks in Transformer-Based Remote Sensing Image Captioning Feb 22, 2025 Decoder Image Captioning
— Unverified 0Weakly Supervised Video Scene Graph Generation via Natural Language Supervision Feb 21, 2025 Graph Generation Image Captioning
Code Code Available 1