FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Generation Nov 23, 2024 Anatomy Image Captioning
Code Code Available 15 ChatEarthNet: A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models Feb 17, 2024 Earth Observation Image Captioning
Code Code Available 15 Chart-to-Text: A Large-Scale Benchmark for Chart Summarization Mar 12, 2022 Data-to-Text Generation Image Captioning
Code Code Available 15 CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning Oct 10, 2022 Decoder Denoising
Code Code Available 15 Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and Reasoning May 31, 2022 Common Sense Reasoning Graph Generation
Code Code Available 15 Evolving Deep Neural Networks Mar 1, 2017 Deep Learning Image Captioning
Code Code Available 15 Evaluating Multimodal Representations on Visual Semantic Textual Similarity Apr 4, 2020 Benchmarking Image Captioning
Code Code Available 15 Exchanging-based Multimodal Fusion with Transformer Sep 5, 2023 Image Captioning Image Generation
Code Code Available 15 An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA Sep 10, 2021 Image Captioning Question Answering
Code Code Available 15 Can images help recognize entities? A study of the role of images for Multimodal NER Oct 23, 2020 Image Captioning named-entity-recognition
Code Code Available 15 Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning Aug 13, 2022 Image Captioning
Code Code Available 15 Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts Apr 12, 2024 Image Captioning Question Answering
Code Code Available 15 Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner May 19, 2023 Dense Captioning Image Captioning
Code Code Available 15 A neural attention model for speech command recognition Aug 27, 2018 Image Captioning model
Code Code Available 15 ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation Dec 31, 2021 Image Captioning Image Generation
Code Code Available 15 CIDEr: Consensus-based Image Description Evaluation Nov 20, 2014 Action Recognition Attribute
Code Code Available 15 Can Audio Captions Be Evaluated with Image Caption Metrics? Oct 10, 2021 AudioCaps Audio captioning
Code Code Available 15 FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks Mar 4, 2023 Cross-Modal Retrieval Image Captioning
Code Code Available 15 CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages Oct 20, 2023 Diversity GPU
Code Code Available 15 CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation Aug 29, 2023 Image Captioning Machine Translation
Code Code Available 15 Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model Sep 20, 2024 Image Captioning Panoptic Segmentation
Code Code Available 15 Fooling Contrastive Language-Image Pre-trained Models with CLIPMasterPrints Jul 7, 2023 Image Captioning Image Retrieval
Code Code Available 15 CNN+CNN: Convolutional Decoders for Image Captioning May 23, 2018 Image Captioning Sentence
Code Code Available 15 Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Jun 15, 2022 Described Object Detection Image Captioning
Code Code Available 15 Exploring Discrete Diffusion Models for Image Captioning Nov 21, 2022 Image Captioning Image Generation
Code Code Available 15 FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model Jun 10, 2024 Image Captioning
Code Code Available 15 ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning Oct 23, 2024 Image Captioning Instruction Following
Code Code Available 15 Analysis of diversity-accuracy tradeoff in image captioning Feb 27, 2020 Diversity Image Captioning
Code Code Available 15 CaMEL: Mean Teacher Learning for Image Captioning Feb 21, 2022 Image Captioning Knowledge Distillation
Code Code Available 15 Learning to Generate Grounded Visual Captions without Localization Supervision Jun 1, 2019 Image Captioning Language Modelling
Code Code Available 15 Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning Aug 8, 2022 Image Captioning Image Generation
Code Code Available 15 End-to-End Supermask Pruning: Learning to Prune Image Captioning Models Oct 7, 2021 Decoder Image Captioning
Code Code Available 15 Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models May 15, 2023 3D Object Detection Image Captioning
Code Code Available 15 BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues Jul 29, 2024 Image Captioning
Code Code Available 15 EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for Printed Mathematical Expression Recognition Jul 6, 2020 Decoder Image Captioning
Code Code Available 15 Brain Captioning: Decoding human brain activity into images and text May 19, 2023 Brain Decoding Depth Estimation
Code Code Available 15 CgT-GAN: CLIP-guided Text GAN for Image Captioning Aug 23, 2023 Image Captioning
Code Code Available 15 Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models Mar 26, 2020 Diversity Image Captioning
Code Code Available 15 End-to-End Transformer Based Model for Image Captioning Mar 29, 2022 Decoder Image Captioning
Code Code Available 15 Exploring Diverse In-Context Configurations for Image Captioning May 24, 2023 Image Captioning In-Context Learning
Code Code Available 15 Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models Oct 7, 2016 Diversity Image Captioning
Code Code Available 15 Diverse Image Captioning with Context-Object Split Latent Spaces Nov 2, 2020 Diversity Image Captioning
Code Code Available 15 Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory Mar 19, 2024 Adversarial Text Diversity
Code Code Available 15 Neural Architecture Search using Deep Neural Networks and Monte Carlo Tree Search May 18, 2018 GPU Image Captioning
Code Code Available 15 Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning Feb 21, 2024 Cross-Modal Retrieval Image Captioning
Code Code Available 15 Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning Dec 15, 2023 Factual Inconsistency Detection in Chart Captioning Image Captioning
Code Code Available 15 DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval Jun 10, 2025 Image Captioning Retrieval
Code Code Available 15 Bi-LORA: A Vision-Language Approach for Synthetic Image Detection Apr 2, 2024 Binary Classification Image Captioning
Code Code Available 15 Boostlet.js: Image processing plugins for the web via JavaScript injection May 13, 2024 Data Visualization Image Captioning
Code Code Available 15 Disentangled Pre-training for Human-Object Interaction Detection Apr 2, 2024 Action Recognition Decoder
Code Code Available 15