Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner May 19, 2023 Dense Captioning Image Captioning
Code Code Available 1Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory Mar 19, 2024 Adversarial Text Diversity
Code Code Available 1Neural Fashion Image Captioning : Accounting for Data Diversity Jun 23, 2021 Decoder Diversity
Code Code Available 1NeuSyRE: Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment Nov 5, 2023 Caption Generation Common Sense Reasoning
Code Code Available 1EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for Printed Mathematical Expression Recognition Jul 6, 2020 Decoder Image Captioning
Code Code Available 1Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models Mar 26, 2020 Diversity Image Captioning
Code Code Available 1Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation Jul 16, 2022 Graph Generation Image Captioning
Code Code Available 1On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective Dec 24, 2022 Decision Making Image Captioning
Code Code Available 1DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training Mar 6, 2023 Decoder Image Captioning
Code Code Available 1Dual-Level Collaborative Transformer for Image Captioning Jan 16, 2021 Descriptive Image Captioning
Code Code Available 1Passage Retrieval for Outside-Knowledge Visual Question Answering May 9, 2021 Image Captioning Object
Code Code Available 1Paying Attention to Descriptions Generated by Image Captioning Models Apr 24, 2017 Image Captioning
Code Code Available 1Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts Apr 12, 2024 Image Captioning Question Answering
Code Code Available 1Distinctive Image Captioning: Leveraging Ground Truth Captions in CLIP Guided Reinforcement Learning Feb 21, 2024 Cross-Modal Retrieval Image Captioning
Code Code Available 1Disentangled Pre-training for Human-Object Interaction Detection Apr 2, 2024 Action Recognition Decoder
Code Code Available 1Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models Oct 7, 2016 Diversity Image Captioning
Code Code Available 1Neural Architecture Search using Deep Neural Networks and Monte Carlo Tree Search May 18, 2018 GPU Image Captioning
Code Code Available 1Position-guided Text Prompt for Vision-Language Pre-training Dec 19, 2022 Cross-Modal Retrieval Image Captioning
Code Code Available 1Concadia: Towards Image-Based Text Generation with a Purpose Apr 16, 2021 Image Captioning Image to text
Code Code Available 1Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts Feb 17, 2021 Caption Generation Diversity
Code Code Available 1Progressive Tree-Structured Prototype Network for End-to-End Image Captioning Nov 17, 2022 Image Captioning
Code Code Available 1Confidence-aware Non-repetitive Multimodal Transformers for TextCaps Dec 7, 2020 Image Captioning Optical Character Recognition
Code Code Available 1Protect, Show, Attend and Tell: Empowering Image Captioning Models with Ownership Protection Aug 25, 2020 Image Captioning image-classification
Code Code Available 1Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation Jul 10, 2024 Image Captioning Image Segmentation
Code Code Available 1Discovering Non-monotonic Autoregressive Orderings with Variational Inference Oct 27, 2021 Decoder Image Captioning
Code Code Available 1DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval Jun 10, 2025 Image Captioning Retrieval
Code Code Available 1Consensus-Aware Visual-Semantic Embedding for Image-Text Matching Jul 17, 2020 Image Captioning Image-text matching
Code Code Available 1RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment Jan 13, 2025 Concept Alignment Image Captioning
Code Code Available 1Diverse Image Captioning with Context-Object Split Latent Spaces Nov 2, 2020 Diversity Image Captioning
Code Code Available 1Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning Jan 1, 2025 cross-modal alignment Denoising
Code Code Available 1Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization Aug 26, 2024 Descriptive Image Captioning
Code Code Available 1Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards Jun 7, 2023 Diversity Image Captioning
Code Code Available 1Differentially Private Representation Learning via Image Captioning Mar 4, 2024 Image Captioning Representation Learning
Code Code Available 1DiffX: Guide Your Layout to Cross-Modal Generative Modeling Jul 22, 2024 Denoising Image Captioning
Code Code Available 1COSMic: A Coherence-Aware Generation Metric for Image Descriptions Sep 11, 2021 Caption Generation Image Captioning
Code Code Available 1Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs Mar 1, 2020 Attribute Caption Generation
Code Code Available 1Detecting Hate Speech in Multi-modal Memes Dec 29, 2020 Binary Classification Hate Speech Detection
Code Code Available 1SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval Jan 24, 2024 Benchmarking Image Captioning
Code Code Available 1See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning Jan 12, 2023 Few-Shot Learning Image Captioning
Code Code Available 1Self-critical Sequence Training for Image Captioning Dec 2, 2016 Image Captioning Policy Gradient Methods
Code Code Available 1ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax Mar 2, 2023 Descriptive Image Captioning
Code Code Available 1Discovering Autoregressive Orderings with Variational Inference Jan 1, 2021 Code Generation Image Captioning
Code Code Available 1In Defense of Scene Graphs for Image Captioning Feb 9, 2021 Human-Object Interaction Detection Image Captioning
Code Code Available 1Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning Dec 15, 2023 Factual Inconsistency Detection in Chart Captioning Image Captioning
Code Code Available 1Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Feb 10, 2015 Caption Generation Image Captioning
Code Code Available 1Show, Deconfound and Tell: Image Captioning With Causal Inference Jan 1, 2022 Causal Inference Decoder
Code Code Available 1Contrastive Vision-Language Alignment Makes Efficient Instruction Learner Nov 29, 2023 Contrastive Learning Image Captioning
Code Code Available 1Sieve: Multimodal Dataset Pruning Using Image Captioning Models Oct 3, 2023 Diversity Image Captioning
Code Code Available 1ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation Dec 31, 2021 Image Captioning Image Generation
Code Code Available 1GAIA: A Global, Multi-modal, Multi-scale Vision-Language Dataset for Remote Sensing Image Analysis Feb 13, 2025 Cross-Modal Retrieval Image Captioning
Code Code Available 1