Learning to Generate Grounded Visual Captions without Localization Supervision Jun 1, 2019 Image Captioning Language Modelling
Code Code Available 15 Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings Feb 16, 2020 Image Captioning Image Generation
Code Code Available 15 Analysis of diversity-accuracy tradeoff in image captioning Feb 27, 2020 Diversity Image Captioning
Code Code Available 15 DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training Mar 6, 2023 Decoder Image Captioning
Code Code Available 15 GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text Aug 14, 2023 Drug Discovery Image Captioning
Code Code Available 15 Learning to Generate Grounded Visual Captions without Localization Supervision Aug 1, 2020 Image Captioning Language Modelling
Code Code Available 15 CLIPScore: A Reference-free Evaluation Metric for Image Captioning Apr 18, 2021 Hallucination Pair-wise Detection (1-ref) Hallucination Pair-wise Detection (4-ref)
Code Code Available 15 Linearly Mapping from Image to Text Space Sep 30, 2022 Image Captioning Image to text
Code Code Available 15 Fooling Contrastive Language-Image Pre-trained Models with CLIPMasterPrints Jul 7, 2023 Image Captioning Image Retrieval
Code Code Available 15 AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning Jul 10, 2024 Audio-Visual Captioning Image Captioning
Code Code Available 15 CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation Aug 29, 2023 Image Captioning Machine Translation
Code Code Available 15 CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning Oct 10, 2022 Decoder Denoising
Code Code Available 15 DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis Nov 12, 2022 COVID-19 Diagnosis Decoder
Code Code Available 15 CNN+CNN: Convolutional Decoders for Image Captioning May 23, 2018 Image Captioning Sentence
Code Code Available 15 Can images help recognize entities? A study of the role of images for Multimodal NER Oct 23, 2020 Image Captioning named-entity-recognition
Code Code Available 15 Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA May 13, 2020 Image Captioning Multi-Label Classification
Code Code Available 15 An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA Sep 10, 2021 Image Captioning Question Answering
Code Code Available 15 Dense Relational Image Captioning via Multi-task Triple-Stream Networks Oct 8, 2020 Graph Generation Image Captioning
Code Code Available 15 CoCa: Contrastive Captioners are Image-Text Foundation Models May 4, 2022 Action Classification Decoder
Code Code Available 15 MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting Oct 13, 2022 Image Captioning Question Answering
Code Code Available 15 IC3: Image Captioning by Committee Consensus Feb 2, 2023 Image Captioning
Code Code Available 15 Bayesian Recurrent Neural Networks Apr 10, 2017 Image Captioning Language Modelling
Code Code Available 15 Detecting Hate Speech in Multi-modal Memes Dec 29, 2020 Binary Classification Hate Speech Detection
Code Code Available 15 Belief Revision based Caption Re-ranker with Visual Semantic Information Sep 16, 2022 Caption Generation Image Captioning
Code Code Available 15 Differentially Private Representation Learning via Image Captioning Mar 4, 2024 Image Captioning Representation Learning
Code Code Available 15 Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning Jan 1, 2025 cross-modal alignment Denoising
Code Code Available 15 GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis Feb 13, 2025 Cross-Modal Retrieval Image Captioning
Code Code Available 15 Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning Dec 11, 2024 Attribute Benchmarking
Code Code Available 15 A large annotated corpus for learning natural language inference Aug 21, 2015 Image Captioning Natural Language Inference
Code Code Available 15 BERTGEN: Multi-task Generation through BERT Jun 7, 2021 Decoder Image Captioning
Code Code Available 15 Gated Hierarchical Attention for Image Captioning Oct 30, 2018 Decoder Image Captioning
Code Code Available 15 Mitigating Gender Bias in Captioning Systems Jun 15, 2020 Benchmarking Gender Prediction
Code Code Available 15 Disentangled Pre-training for Human-Object Interaction Detection Apr 2, 2024 Action Recognition Decoder
Code Code Available 15 Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning Feb 21, 2024 Cross-Modal Retrieval Image Captioning
Code Code Available 15 Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning May 9, 2022 Image Captioning Object
Code Code Available 15 Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models Oct 7, 2016 Diversity Image Captioning
Code Code Available 15 ChatEarthNet: A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models Feb 17, 2024 Earth Observation Image Captioning
Code Code Available 15 Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model Aug 2, 2023 Hallucination Image Captioning
Code Code Available 15 Beyond Greedy Search: Tracking by Multi-Agent Reinforcement Learning-based Beam Search May 19, 2022 Decision Making Image Captioning
Code Code Available 15 Aesthetically Relevant Image Captioning Nov 25, 2022 Image Captioning Sentence
Code Code Available 15 MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning Aug 25, 2023 Image Captioning Video Captioning
Code Code Available 15 Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts Nov 16, 2021 Cross-Modal Retrieval Image Captioning
Code Code Available 15 Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model Sep 20, 2024 Image Captioning Panoptic Segmentation
Code Code Available 15 FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions May 28, 2023 Attribute Image Captioning
Code Code Available 15 GeneAnnotator: A Semi-automatic Annotation Tool for Visual Scene Graph Sep 6, 2021 Graph Generation Graph Learning
Code Code Available 15 Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand Dec 8, 2021 Image Captioning Machine Translation
Code Code Available 15 Dual-Level Collaborative Transformer for Image Captioning Jan 16, 2021 Descriptive Image Captioning
Code Code Available 15 EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for Printed Mathematical Expression Recognition Jul 6, 2020 Decoder Image Captioning
Code Code Available 15 Mutual Information Divergence: A Unified Metric for Multimodal Generative Models May 25, 2022 Hallucination Pair-wise Detection (1-ref) Hallucination Pair-wise Detection (4-ref)
Code Code Available 15 CgT-GAN: CLIP-guided Text GAN for Image Captioning Aug 23, 2023 Image Captioning
Code Code Available 15