Cross-Modal Consistency in Multimodal Large Language Models Nov 14, 2024 Image Captioning object-detection
— Unverified 0Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions Nov 13, 2024 Descriptive Hallucination
Code Code Available 0Grounded Video Caption Generation Nov 12, 2024 Caption Generation Image Captioning
— Unverified 0BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Nov 12, 2024 Descriptive Image Captioning
— Unverified 0ViTOC: Vision Transformer and Object-aware Captioner Nov 9, 2024 Diversity Image Captioning
— Unverified 0Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models Nov 8, 2024 Image Captioning Image Generation
— Unverified 0Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model Nov 7, 2024 Image Captioning Image Generation
Code Code Available 0Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models Nov 7, 2024 Adversarial Attack Image Captioning
— Unverified 0LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation Nov 7, 2024 Contrastive Learning Image Captioning
Code Code Available 4RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering Nov 3, 2024 Descriptive Image Captioning
— Unverified 0Designing a Robust Radiology Report Generation System Nov 2, 2024 Decision Making Diagnostic
— Unverified 0Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP Oct 31, 2024 Image Captioning Prompt Learning
— Unverified 0Nearest Neighbor Normalization Improves Multimodal Retrieval Oct 31, 2024 Cross-Modal Retrieval Image Captioning
Code Code Available 1Large Language Model Benchmarks in Medical Tasks Oct 28, 2024 Image Captioning Language Modeling
— Unverified 0Image Generation from Image Captioning -- Invertible Approach Oct 26, 2024 Image Captioning Image Generation
— Unverified 0Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts Oct 25, 2024 Denoising Image Captioning
— Unverified 0Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing Oct 23, 2024 Adversarial Attack Backdoor Attack
— Unverified 0ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning Oct 23, 2024 Image Captioning Instruction Following
Code Code Available 1Altogether: Image Captioning via Re-aligning Alt-text Oct 22, 2024 Image Captioning image-classification
Code Code Available 0Frontiers in Intelligent Colonoscopy Oct 22, 2024 Image Captioning
Code Code Available 2VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use Oct 21, 2024 Image Captioning Task Planning
— Unverified 0TIPS: Text-Image Pretraining with Spatial Awareness Oct 21, 2024 Depth Estimation Image Captioning
Code Code Available 2MI-VisionShot: Few-shot adaptation of vision-language models for slide-level classification of histopathological images Oct 21, 2024 Few-Shot Learning Image Captioning
Code Code Available 0An Efficient System for Automatic Map Storytelling -- A Case Study on Historical Maps Oct 21, 2024 Image Captioning
Code Code Available 0RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models Oct 17, 2024 Image Captioning Question Answering
Code Code Available 2Hiding-in-Plain-Sight (HiPS) Attack on CLIP for Targetted Object Removal from Images Oct 16, 2024 Image Captioning Object
— Unverified 0Self-adaptive Multimodal Retrieval-Augmented Generation Oct 15, 2024 Image Captioning RAG
Code Code Available 0MMCFND: Multimodal Multilingual Caption-aware Fake News Detection for Low-resource Indic Languages Oct 14, 2024 Articles Descriptive
— Unverified 0CLIP-SCGI: Synthesized Caption-Guided Inversion for Person Re-Identification Oct 12, 2024 Image Captioning Person Re-Identification
— Unverified 0A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks Oct 10, 2024 Fairness Image Captioning
Code Code Available 0An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment Oct 8, 2024 Audio captioning Contrastive Learning
Code Code Available 0Core Tokensets for Data-efficient Sequential Training of Transformers Oct 8, 2024 Image Captioning image-classification
Code Code Available 0AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models Oct 7, 2024 Image Captioning Image-text Retrieval
— Unverified 0CAPEEN: Image Captioning with Early Exits and Knowledge Distillation Oct 6, 2024 Descriptive Image Captioning
Code Code Available 0AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark Oct 4, 2024 Image Captioning Video Understanding
— Unverified 0Quantifying the Gaps Between Translation and Native Perception in Training for Multimodal, Multilingual Retrieval Oct 2, 2024 Image Captioning Retrieval
— Unverified 0Backdooring Vision-Language Models with Out-Of-Distribution Data Oct 2, 2024 Image Captioning Image to text
— Unverified 0TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image Captioning Sep 30, 2024 Image Captioning Object
Code Code Available 0TrojVLM: Backdoor Attack Against Vision Language Models Sep 28, 2024 Backdoor Attack Image Captioning
— Unverified 0DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning Sep 28, 2024 Hallucination Image Captioning
— Unverified 0Enhancing Explainability in Multimodal Large Language Models Using Ontological Context Sep 27, 2024 Image Captioning Question Answering
— Unverified 0A TextGCN-Based Decoding Approach for Improving Remote Sensing Image Captioning Sep 27, 2024 Decoder Fairness
— Unverified 0IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning Sep 26, 2024 Image Captioning Retrieval
Code Code Available 1Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models Sep 25, 2024 Image Captioning
Code Code Available 4Brotherhood at WMT 2024: Leveraging LLM-Generated Contextual Conversations for Cross-Lingual Image Captioning Sep 23, 2024 Image Captioning Semantic Similarity
— Unverified 0Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption Utilization Sep 22, 2024 Hallucination Hallucination Evaluation
Code Code Available 0@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology Sep 21, 2024 Benchmarking Depth Estimation
— Unverified 0FullAnno: A Data Engine for Enhancing Image Comprehension of MLLMs Sep 20, 2024 Image Captioning Image Comprehension
— Unverified 0YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models Sep 20, 2024 Benchmarking Image Captioning
Code Code Available 1Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model Sep 20, 2024 Image Captioning Panoptic Segmentation
Code Code Available 1