FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions May 28, 2023 Attribute Image Captioning
Code Code Available 15 GeneAnnotator: A Semi-automatic Annotation Tool for Visual Scene Graph Sep 6, 2021 Graph Generation Graph Learning
Code Code Available 15 GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text Aug 14, 2023 Drug Discovery Image Captioning
Code Code Available 15 Fooling Contrastive Language-Image Pre-trained Models with CLIPMasterPrints Jul 7, 2023 Image Captioning Image Retrieval
Code Code Available 15 CLIPScore: A Reference-free Evaluation Metric for Image Captioning Apr 18, 2021 Hallucination Pair-wise Detection (1-ref) Hallucination Pair-wise Detection (4-ref)
Code Code Available 15 From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping Apr 26, 2023 Decoder Image Captioning
Code Code Available 15 CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation Aug 29, 2023 Image Captioning Machine Translation
Code Code Available 15 CIDEr: Consensus-based Image Description Evaluation Nov 20, 2014 Action Recognition Attribute
Code Code Available 15 FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context Mar 4, 2022 Decoder Image Captioning
Code Code Available 15 Chart-to-Text: A Large-Scale Benchmark for Chart Summarization Mar 12, 2022 Data-to-Text Generation Image Captioning
Code Code Available 15 Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards Aug 6, 2020 Attribute Image Captioning
Code Code Available 15 FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Generation Nov 23, 2024 Anatomy Image Captioning
Code Code Available 15 Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and Reasoning May 31, 2022 Common Sense Reasoning Graph Generation
Code Code Available 15 FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing May 27, 2023 Graph Similarity Human Judgment Correlation
Code Code Available 15 A Survey on Efficient Vision-Language Models Apr 13, 2025 Image Captioning Question Answering
Code Code Available 15 Can Audio Captions Be Evaluated with Image Caption Metrics? Oct 10, 2021 AudioCaps Audio captioning
Code Code Available 15 Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning Aug 13, 2022 Image Captioning
Code Code Available 15 Evolving Deep Neural Networks Mar 1, 2017 Deep Learning Image Captioning
Code Code Available 15 Adapting Grad-CAM for Embedding Networks Jan 17, 2020 Image Captioning image-classification
Code Code Available 15 Exchanging-based Multimodal Fusion with Transformer Sep 5, 2023 Image Captioning Image Generation
Code Code Available 15 ChatEarthNet: A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models Feb 17, 2024 Earth Observation Image Captioning
Code Code Available 15 FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks Mar 4, 2023 Cross-Modal Retrieval Image Captioning
Code Code Available 15 CgT-GAN: CLIP-guided Text GAN for Image Captioning Aug 23, 2023 Image Captioning
Code Code Available 15 Exploring Discrete Diffusion Models for Image Captioning Nov 21, 2022 Image Captioning Image Generation
Code Code Available 15 ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation Dec 31, 2021 Image Captioning Image Generation
Code Code Available 15 CaMEL: Mean Teacher Learning for Image Captioning Feb 21, 2022 Image Captioning Knowledge Distillation
Code Code Available 15 FLEUR: An Explainable Reference-Free Evaluation Metric for Image Captioning Using a Large Multimodal Model Jun 10, 2024 Image Captioning
Code Code Available 15 Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models May 15, 2023 3D Object Detection Image Captioning
Code Code Available 15 BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues Jul 29, 2024 Image Captioning
Code Code Available 15 CAPIVARA: Cost-Efficient Approach for Improving Multilingual CLIP Performance on Low-Resource Languages Oct 20, 2023 Diversity GPU
Code Code Available 15 Evaluating Multimodal Representations on Visual Semantic Textual Similarity Apr 4, 2020 Benchmarking Image Captioning
Code Code Available 15 CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning Oct 10, 2022 Decoder Denoising
Code Code Available 15 Neural Architecture Search using Deep Neural Networks and Monte Carlo Tree Search May 18, 2018 GPU Image Captioning
Code Code Available 15 Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Jun 15, 2022 Described Object Detection Image Captioning
Code Code Available 15 GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis Feb 13, 2025 Cross-Modal Retrieval Image Captioning
Code Code Available 15 CNN+CNN: Convolutional Decoders for Image Captioning May 23, 2018 Image Captioning Sentence
Code Code Available 15 Exploring Diverse In-Context Configurations for Image Captioning May 24, 2023 Image Captioning In-Context Learning
Code Code Available 15 PAINT: Paying Attention to INformed Tokens to Mitigate Hallucination in Large Vision-Language Model Jan 21, 2025 Hallucination Image Captioning
Code Code Available 15 COCO-Stuff: Thing and Stuff Classes in Context Dec 12, 2016 Image Captioning Semantic Segmentation
Code Code Available 15 CoCa: Contrastive Captioners are Image-Text Foundation Models May 4, 2022 Action Classification Decoder
Code Code Available 15 Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models Mar 26, 2020 Diversity Image Captioning
Code Code Available 15 A large annotated corpus for learning natural language inference Aug 21, 2015 Image Captioning Natural Language Inference
Code Code Available 15 Graph Optimal Transport for Cross-Domain Alignment Jun 26, 2020 Graph Matching Image Captioning
Code Code Available 15 GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features Jul 20, 2022 Image Captioning
Code Code Available 15 AutoAD: Movie Description in Context Mar 29, 2023 Image Captioning Text Generation
Code Code Available 15 Comprehensive Image Captioning via Scene Graph Decomposition Jul 23, 2020 Diversity Image Captioning
Code Code Available 15 Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts Feb 17, 2021 Caption Generation Diversity
Code Code Available 15 Confidence-aware Non-repetitive Multimodal Transformers for TextCaps Dec 7, 2020 Image Captioning Optical Character Recognition
Code Code Available 15 Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning Aug 8, 2022 Image Captioning Image Generation
Code Code Available 15 Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory Mar 19, 2024 Adversarial Text Diversity
Code Code Available 15