Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models Dec 8, 2024 Image Captioning
Code Code Available 0HMGIE: Hierarchical and Multi-Grained Inconsistency Evaluation for Vision-Language Data Cleansing Dec 7, 2024 Answer Generation Graph Generation
— Unverified 0Automated Medical Report Generation for ECG Data: Bridging Medical Text and Signal Processing with Deep Learning Dec 5, 2024 Comment Generation Decoder
Code Code Available 0Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis Dec 4, 2024 Image Captioning Image Description
— Unverified 0Progress-Aware Video Frame Captioning Dec 3, 2024 Image Captioning Video Captioning
— Unverified 0CEGI: Measuring the trade-off between efficiency and carbon emissions for SLMs and VLMs Dec 3, 2024 Image Captioning Quantization
— Unverified 0DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding Dec 2, 2024 Caption Generation Domain Generalization
— Unverified 0Improving Multimodal LLMs Ability In Geometry Problem Solving, Reasoning, And Multistep Scoring Dec 1, 2024 Automated Theorem Proving Geometry Problem Solving
— Unverified 0Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers Nov 28, 2024 Image Captioning image-classification
— Unverified 0OPCap:Object-aware Prompting Captioning Nov 27, 2024 Attribute Decoder
— Unverified 0Active Data Curation Effectively Distills Large-Scale Multimodal Models Nov 27, 2024 Decoder Image Captioning
— Unverified 0Efficient Multi-modal Large Language Models via Visual Token Grouping Nov 26, 2024 Image Captioning Question Answering
— Unverified 0Debiasing Classifiers by Amplifying Bias with Latent Diffusion and Large Language Models Nov 25, 2024 Attribute Computational Efficiency
— Unverified 0Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks Nov 24, 2024 Image Captioning Natural Language Understanding
— Unverified 0FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity Nov 23, 2024 Attribute Cross-Modal Retrieval
— Unverified 0Uterine Ultrasound Image Captioning Using Deep Learning Techniques Nov 21, 2024 Deep Learning Descriptive
— Unverified 0Mitigating Perception Bias: A Training-Free Approach to Enhance LMM for Image Quality Assessment Nov 19, 2024 Image Captioning Image Quality Assessment
— Unverified 0AI Flow at the Network Edge Nov 19, 2024 Image Captioning
— Unverified 0The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioning Nov 18, 2024 Image Captioning
Code Code Available 0Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuning Nov 17, 2024 Image Captioning Language Modeling
Code Code Available 0MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild Nov 17, 2024 Active Learning Image Captioning
— Unverified 0Cross-Modal Consistency in Multimodal Large Language Models Nov 14, 2024 Image Captioning object-detection
— Unverified 0Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions Nov 13, 2024 Descriptive Hallucination
Code Code Available 0Grounded Video Caption Generation Nov 12, 2024 Caption Generation Image Captioning
— Unverified 0BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Nov 12, 2024 Descriptive Image Captioning
— Unverified 0ViTOC: Vision Transformer and Object-aware Captioner Nov 9, 2024 Diversity Image Captioning
— Unverified 0Image2Text2Image: A Novel Framework for Label-Free Evaluation of Image-to-Text Generation with Text-to-Image Diffusion Models Nov 8, 2024 Image Captioning Image Generation
— Unverified 0Precision or Recall? An Analysis of Image Captions for Training Text-to-Image Generation Model Nov 7, 2024 Image Captioning Image Generation
Code Code Available 0Seeing is Deceiving: Exploitation of Visual Pathways in Multi-Modal Language Models Nov 7, 2024 Adversarial Attack Image Captioning
— Unverified 0RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering Nov 3, 2024 Descriptive Image Captioning
— Unverified 0Designing a Robust Radiology Report Generation System Nov 2, 2024 Decision Making Diagnostic
— Unverified 0Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP Oct 31, 2024 Image Captioning Prompt Learning
— Unverified 0Large Language Model Benchmarks in Medical Tasks Oct 28, 2024 Image Captioning Language Modeling
— Unverified 0Image Generation from Image Captioning -- Invertible Approach Oct 26, 2024 Image Captioning Image Generation
— Unverified 0Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts Oct 25, 2024 Denoising Image Captioning
— Unverified 0Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing Oct 23, 2024 Adversarial Attack Backdoor Attack
— Unverified 0Altogether: Image Captioning via Re-aligning Alt-text Oct 22, 2024 Image Captioning image-classification
— Unverified 0An Efficient System for Automatic Map Storytelling -- A Case Study on Historical Maps Oct 21, 2024 Image Captioning
Code Code Available 0VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use Oct 21, 2024 Image Captioning Task Planning
— Unverified 0MI-VisionShot: Few-shot adaptation of vision-language models for slide-level classification of histopathological images Oct 21, 2024 Few-Shot Learning Image Captioning
Code Code Available 0Hiding-in-Plain-Sight (HiPS) Attack on CLIP for Targetted Object Removal from Images Oct 16, 2024 Image Captioning Object
— Unverified 0Self-adaptive Multimodal Retrieval-Augmented Generation Oct 15, 2024 Image Captioning RAG
Code Code Available 0MMCFND: Multimodal Multilingual Caption-aware Fake News Detection for Low-resource Indic Languages Oct 14, 2024 Articles Descriptive
— Unverified 0CLIP-SCGI: Synthesized Caption-Guided Inversion for Person Re-Identification Oct 12, 2024 Image Captioning Person Re-Identification
— Unverified 0A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks Oct 10, 2024 Fairness Image Captioning
Code Code Available 0An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment Oct 8, 2024 Audio captioning Contrastive Learning
Code Code Available 0Core Tokensets for Data-efficient Sequential Training of Transformers Oct 8, 2024 Image Captioning image-classification
Code Code Available 0AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models Oct 7, 2024 Image Captioning Image-text Retrieval
— Unverified 0CAPEEN: Image Captioning with Early Exits and Knowledge Distillation Oct 6, 2024 Descriptive Image Captioning
Code Code Available 0AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark Oct 4, 2024 Image Captioning Video Understanding
— Unverified 0