| Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning | Dec 31, 2024 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards | Aug 15, 2019 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| UNISON: Unpaired Cross-lingual Image Captioning | Oct 3, 2020 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| ViCo: Engaging Video Comment Generation with Human Preference Rewards | Aug 22, 2023 | Caption GenerationComment Generation | —Unverified | 0 | 0 |
| Video Caption Dataset for Describing Human Actions in Japanese | Mar 10, 2020 | Caption Generation | —Unverified | 0 | 0 |
| Video Captioning in Compressed Video | Jan 2, 2021 | Caption GenerationVideo Captioning | —Unverified | 0 | 0 |
| Video Captioning with Guidance of Multimodal Latent Topics | Aug 31, 2017 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives | May 20, 2025 | Caption GenerationContrastive Learning | —Unverified | 0 | 0 |
| Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning | Nov 2, 2023 | Caption GenerationEfficient Exploration | —Unverified | 0 | 0 |
| Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation | Apr 30, 2024 | Caption GenerationHallucination | —Unverified | 0 | 0 |
| WAT2019: English-Hindi Translation on Hindi Visual Genome Dataset | Nov 1, 2019 | Caption GenerationTranslation | —Unverified | 0 | 0 |
| Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models | Sep 10, 2020 | Caption GenerationDenoising | —Unverified | 0 | 0 |
| Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching | May 18, 2021 | Caption GenerationCross-Modal Retrieval | —Unverified | 0 | 0 |
| What is not where: the challenge of integrating spatial representations into deep learning architectures | Jul 21, 2018 | Caption GenerationDeep Learning | —Unverified | 0 | 0 |
| Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned | Sep 26, 2022 | Caption GenerationSemantic Similarity | —Unverified | 0 | 0 |
| XMeCap: Meme Caption Generation with Sub-Image Adaptability | Jul 24, 2024 | Caption GenerationMeme Captioning | —Unverified | 0 | 0 |
| YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension | Nov 1, 2019 | Caption GenerationQuestion Answering | —Unverified | 0 | 0 |
| 3G structure for image caption generation | Apr 21, 2019 | Caption GenerationSentence | —Unverified | 0 | 0 |
| 3M: Multi-style image caption generation using Multi-modality features under Multi-UPDOWN model | Mar 20, 2021 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation | Oct 11, 2023 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism | Mar 3, 2022 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation | Jan 18, 2024 | Caption GenerationLanguage Modeling | —Unverified | 0 | 0 |
| AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes | Jul 14, 2023 | AttributeCaption Generation | —Unverified | 0 | 0 |
| Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning | Sep 20, 2023 | Audio captioningCaption Generation | —Unverified | 0 | 0 |
| Aligning Images and Text with Semantic Role Labels for Fine-Grained Cross-Modal Understanding | Jun 1, 2022 | Caption GenerationImage Retrieval | —Unverified | 0 | 0 |