A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models Oct 16, 2021 Image Captioning Language Modeling
Code Code Available 15 MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting Oct 13, 2022 Image Captioning Question Answering
Code Code Available 15 Kosmos-2: Grounding Multimodal Large Language Models to the World Jun 26, 2023 Image Captioning In-Context Learning
Code Code Available 15 Describe What to Change: A Text-guided Unsupervised Image-to-Image Translation Approach Aug 10, 2020 Attribute Image Captioning
Code Code Available 15 DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis Nov 12, 2022 COVID-19 Diagnosis Decoder
Code Code Available 15 MemeCap: A Dataset for Captioning and Interpreting Memes May 23, 2023 Image Captioning Meme Captioning
Code Code Available 15 Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA May 13, 2020 Image Captioning Multi-Label Classification
Code Code Available 15 Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training Jan 4, 2024 Descriptive Image Captioning
Code Code Available 15 Dense Relational Image Captioning via Multi-task Triple-Stream Networks Oct 8, 2020 Graph Generation Image Captioning
Code Code Available 15 Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning Mar 14, 2019 Diversity Image Captioning
Code Code Available 15 BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues Jul 29, 2024 Image Captioning
Code Code Available 15 Discovering Non-monotonic Autoregressive Orderings with Variational Inference Oct 27, 2021 Decoder Image Captioning
Code Code Available 15 Detecting and Recovering Sequential DeepFake Manipulation Jul 5, 2022 DeepFake Detection Face Swapping
Code Code Available 15 CgT-GAN: CLIP-guided Text GAN for Image Captioning Aug 23, 2023 Image Captioning
Code Code Available 15 A large annotated corpus for learning natural language inference Aug 21, 2015 Image Captioning Natural Language Inference
Code Code Available 15 Discovering Autoregressive Orderings with Variational Inference Jan 1, 2021 Code Generation Image Captioning
Code Code Available 15 DiffX: Guide Your Layout to Cross-Modal Generative Modeling Jul 22, 2024 Denoising Image Captioning
Code Code Available 15 Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning Jan 1, 2025 cross-modal alignment Denoising
Code Code Available 15 ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax Mar 2, 2023 Descriptive Image Captioning
Code Code Available 15 Brain Captioning: Decoding human brain activity into images and text May 19, 2023 Brain Decoding Depth Estimation
Code Code Available 15 ChatEarthNet: A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models Feb 17, 2024 Earth Observation Image Captioning
Code Code Available 15 ConvNet Architecture Search for Spatiotemporal Feature Learning Aug 16, 2017 Action Classification Action Recognition
Code Code Available 15 Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models Oct 7, 2016 Diversity Image Captioning
Code Code Available 15 Diverse Image Captioning with Context-Object Split Latent Spaces Nov 2, 2020 Diversity Image Captioning
Code Code Available 15 Mutual Information Divergence: A Unified Metric for Multimodal Generative Models May 25, 2022 Hallucination Pair-wise Detection (1-ref) Hallucination Pair-wise Detection (4-ref)
Code Code Available 15 Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly Detection Oct 29, 2023 Anomaly Detection Image Captioning
Code Code Available 15 It is Okay to Not Be Okay: Overcoming Emotional Bias in Affective Image Captioning by Contrastive Data Collection Apr 15, 2022 Image Captioning
Code Code Available 15 Adapting Grad-CAM for Embedding Networks Jan 17, 2020 Image Captioning image-classification
Code Code Available 15 LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation Nov 25, 2024 Image Captioning RAG
Code Code Available 15 Dual-branch Hybrid Learning Network for Unbiased Scene Graph Generation Jul 16, 2022 Graph Generation Image Captioning
Code Code Available 15 Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering Jul 25, 2017 Image Captioning Visual Question Answering
Code Code Available 15 Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning Dec 27, 2022 Image Captioning Image Retrieval
Code Code Available 15 A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions Dec 14, 2023 Image Captioning
Code Code Available 15 CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning Oct 10, 2022 Decoder Denoising
Code Code Available 15 InfMLLM: A Unified Framework for Visual-Language Tasks Nov 12, 2023 GPU Image Captioning
Code Code Available 15 A Survey on Efficient Vision-Language Models Apr 13, 2025 Image Captioning Question Answering
Code Code Available 15 CLIPScore: A Reference-free Evaluation Metric for Image Captioning Apr 18, 2021 Hallucination Pair-wise Detection (1-ref) Hallucination Pair-wise Detection (4-ref)
Code Code Available 15 CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation Aug 29, 2023 Image Captioning Machine Translation
Code Code Available 15 Bootstrapping Interactive Image-Text Alignment for Remote Sensing Image Captioning Dec 2, 2023 Causal Language Modeling Contrastive Learning
Code Code Available 15 Boostlet.js: Image processing plugins for the web via JavaScript injection May 13, 2024 Data Visualization Image Captioning
Code Code Available 15 Consensus-Aware Visual-Semantic Embedding for Image-Text Matching Jul 17, 2020 Image Captioning Image-text matching
Code Code Available 15 Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone Jun 15, 2022 Described Object Detection Image Captioning
Code Code Available 15 COBRA: Contrastive Bi-Modal Representation Algorithm May 7, 2020 Cross-Modal Retrieval Image Captioning
Code Code Available 15 Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner May 19, 2023 Dense Captioning Image Captioning
Code Code Available 15 CoCa: Contrastive Captioners are Image-Text Foundation Models May 4, 2022 Action Classification Decoder
Code Code Available 15 Paying Attention to Descriptions Generated by Image Captioning Models Apr 24, 2017 Image Captioning
Code Code Available 15 InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation May 10, 2023 Benchmarking Image Captioning
Code Code Available 15 Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory Mar 19, 2024 Adversarial Text Diversity
Code Code Available 15 Confidence-aware Non-repetitive Multimodal Transformers for TextCaps Dec 7, 2020 Image Captioning Optical Character Recognition
Code Code Available 15 Improving Image Captioning with Better Use of Captions Jun 21, 2020 Caption Generation Image Captioning
Code Code Available 15