Zero-Shot Video Captioning with Evolving Pseudo-Tokens Jul 22, 2022 Image Captioning Image-text matching
Code Code Available 1GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features Jul 20, 2022 Image Captioning
Code Code Available 1Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation Jul 16, 2022 Graph Generation Image Captioning
Code Code Available 1Detecting and Recovering Sequential DeepFake Manipulation Jul 5, 2022 DeepFake Detection Face Swapping
Code Code Available 1What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs Jun 19, 2022 Benchmarking Image Captioning
Code Code Available 1Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Jun 15, 2022 Described Object Detection Image Captioning
Code Code Available 1Expressive Scene Graph Generation Using Commonsense Knowledge Infusion for Visual Understanding and Reasoning May 31, 2022 Common Sense Reasoning Graph Generation
Code Code Available 1Mutual Information Divergence: A Unified Metric for Multimodal Generative Models May 25, 2022 Hallucination Pair-wise Detection (1-ref) Hallucination Pair-wise Detection (4-ref)
Code Code Available 1mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections May 24, 2022 Computational Efficiency cross-modal alignment
Code Code Available 1Beyond Greedy Search: Tracking by Multi-Agent Reinforcement Learning-based Beam Search May 19, 2022 Decision Making Image Captioning
Code Code Available 1Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning May 9, 2022 Image Captioning Object
Code Code Available 1CoCa: Contrastive Captioners are Image-Text Foundation Models May 4, 2022 Action Classification Decoder
Code Code Available 1Image Captioning In the Transformer Age Apr 15, 2022 Decoder Image Captioning
Code Code Available 1It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection Apr 15, 2022 Image Captioning
Code Code Available 1End-to-End Transformer Based Model for Image Captioning Mar 29, 2022 Decoder Image Captioning
Code Code Available 1Quantifying Societal Bias Amplification in Image Captioning Mar 29, 2022 Attribute Image Captioning
Code Code Available 1Linking Emergent and Natural Languages via Corpus Transfer Mar 24, 2022 Attribute Disentanglement
Code Code Available 1On Vision Features in Multimodal Machine Translation Mar 17, 2022 Image Captioning Machine Translation
Code Code Available 1Chart-to-Text: A Large-Scale Benchmark for Chart Summarization Mar 12, 2022 Data-to-Text Generation Image Captioning
Code Code Available 1FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context Mar 4, 2022 Decoder Image Captioning
Code Code Available 1CaMEL: Mean Teacher Learning for Image Captioning Feb 21, 2022 Image Captioning Knowledge Distillation
Code Code Available 1ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning Feb 11, 2022 Image Captioning Relation
Code Code Available 1Compact Bidirectional Transformer for Image Captioning Jan 6, 2022 Decoder Image Captioning
Code Code Available 1DeeCap: Dynamic Early Exiting for Efficient Image Captioning Jan 1, 2022 Image Captioning Imitation Learning
Code Code Available 1Show, Deconfound and Tell: Image Captioning With Causal Inference Jan 1, 2022 Causal Inference Decoder
Code Code Available 1ERNIE-ViLG: Unified Generative Pre-training for Bidirectional Vision-Language Generation Dec 31, 2021 Image Captioning Image Generation
Code Code Available 1VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks Dec 13, 2021 Image Captioning Transfer Learning
Code Code Available 1Injecting Semantic Concepts into End-to-End Image Captioning Dec 9, 2021 Caption Generation Image Captioning
Code Code Available 1Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand Dec 8, 2021 Image Captioning Machine Translation
Code Code Available 1ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic Nov 29, 2021 Contrastive Learning Descriptive
Code Code Available 1UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling Nov 23, 2021 Image Captioning Image Description
Code Code Available 1L-Verse: Bidirectional Generation Between Image and Text Nov 22, 2021 Image Captioning Image Generation
Code Code Available 1Transparent Human Evaluation for Image Captioning Nov 17, 2021 Image Captioning
Code Code Available 1Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts Nov 16, 2021 Cross-Modal Retrieval Image Captioning
Code Code Available 1Discovering Non-monotonic Autoregressive Orderings with Variational Inference Oct 27, 2021 Decoder Image Captioning
Code Code Available 1SciCap: Generating Captions for Scientific Figures Oct 22, 2021 Articles Image Captioning
Code Code Available 1A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models Oct 16, 2021 Image Captioning Language Modeling
Code Code Available 1Can Audio Captions Be Evaluated with Image Caption Metrics? Oct 10, 2021 AudioCaps Audio captioning
Code Code Available 1End-to-End Supermask Pruning: Learning to Prune Image Captioning Models Oct 7, 2021 Decoder Image Captioning
Code Code Available 1Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning Oct 4, 2021 Hallucination Image Captioning
Code Code Available 1COSMic: A Coherence-Aware Generation Metric for Image Descriptions Sep 11, 2021 Caption Generation Image Captioning
Code Code Available 1An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA Sep 10, 2021 Image Captioning Question Answering
Code Code Available 1GeneAnnotator: A Semi-automatic Annotation Tool for Visual Scene Graph Sep 6, 2021 Graph Generation Graph Learning
Code Code Available 1Zero-shot Natural Language Video Localization Aug 29, 2021 Image Captioning
Code Code Available 1Automatic Text Evaluation through the Lens of Wasserstein Barycenters Aug 27, 2021 Image Captioning Machine Translation
Code Code Available 1SimVLM: Simple Visual Language Model Pretraining with Weak Supervision Aug 24, 2021 Image Captioning Language Modeling
Code Code Available 1X-modaler: A Versatile and High-performance Codebase for Cross-modal Analytics Aug 18, 2021 Cross-Modal Retrieval Decoder
Code Code Available 1Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision Aug 12, 2021 3D geometry Descriptive
Code Code Available 1Question-controlled Text-aware Image Captioning Aug 4, 2021 Decoder Image Captioning
Code Code Available 1UMIC: An Unreferenced Metric for Image Captioning via Contrastive Learning Jun 26, 2021 Contrastive Learning Diversity
Code Code Available 1