| Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation | Sep 5, 2019 | AttributeCaption Generation | —Unverified | 0 |
| Structural and Functional Decomposition for Personality Image Captioning in a Communication Game | Nov 17, 2020 | Caption GenerationImage Captioning | —Unverified | 0 |
| StyleNet: Generating Attractive Visual Captions With Styles | Jul 1, 2017 | Caption Generation | —Unverified | 0 |
| Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models | Apr 5, 2023 | Caption GenerationImage Generation | —Unverified | 0 |
| Temporal Knowledge-Aware Image Captioning | Nov 16, 2021 | Caption GenerationImage Captioning | —Unverified | 0 |
| Temporal Object Captioning for Street Scene Videos from LiDAR Tracks | May 22, 2025 | Caption GenerationVideo Captioning | —Unverified | 0 |
| THE DCASE 2021 CHALLENGE TASK 6 SYSTEM: AUTOMATED AUDIO CAPTIONING WITH WEAKLY SUPERVISED PRE-TRAING AND WORD SELECTION METHODS | Jul 6, 2021 | Audio captioningCaption Generation | —Unverified | 0 |
| The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation | Jul 1, 2020 | Audio captioningCaption Generation | —Unverified | 0 |
| The Solution for the ICCV 2023 1st Scientific Figure Captioning Challenge | Mar 26, 2024 | Caption GenerationImage Captioning | —Unverified | 0 |
| The Use of Object Labels and Spatial Prepositions as Keywords in a Web-Retrieval-Based Image Caption Generation System | Apr 1, 2017 | Caption GenerationImage Retrieval | —Unverified | 0 |
| Time Series Language Model for Descriptive Caption Generation | Jan 3, 2025 | Caption GenerationDenoising | —Unverified | 0 |
| TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation | Apr 24, 2025 | Caption GenerationDense Video Captioning | —Unverified | 0 |
| Topic Scene Graph Generation by Attention Distillation From Caption | Jan 1, 2021 | Caption GenerationGraph Generation | —Unverified | 0 |
| TPsgtR: Neural-Symbolic Tensor Product Scene-Graph-Triplet Representation for Image Captioning | Nov 22, 2019 | Caption GenerationImage Captioning | —Unverified | 0 |
| Uncertainty-Aware Image Captioning | Nov 30, 2022 | Caption GenerationImage Captioning | —Unverified | 0 |
| Understanding How Paper Writers Use AI-Generated Captions in Figure Caption Writing | Jan 10, 2025 | Caption Generation | —Unverified | 0 |
| Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning | Dec 31, 2024 | Caption GenerationDecoder | —Unverified | 0 |
| Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards | Aug 15, 2019 | Caption GenerationImage Captioning | —Unverified | 0 |
| UNISON: Unpaired Cross-lingual Image Captioning | Oct 3, 2020 | Caption GenerationImage Captioning | —Unverified | 0 |
| ViCo: Engaging Video Comment Generation with Human Preference Rewards | Aug 22, 2023 | Caption GenerationComment Generation | —Unverified | 0 |
| Video Caption Dataset for Describing Human Actions in Japanese | Mar 10, 2020 | Caption Generation | —Unverified | 0 |
| Video Captioning in Compressed Video | Jan 2, 2021 | Caption GenerationVideo Captioning | —Unverified | 0 |
| Video Captioning with Guidance of Multimodal Latent Topics | Aug 31, 2017 | Caption GenerationDecoder | —Unverified | 0 |
| Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives | May 20, 2025 | Caption GenerationContrastive Learning | —Unverified | 0 |
| Visual Analytics for Efficient Image Exploration and User-Guided Image Captioning | Nov 2, 2023 | Caption GenerationEfficient Exploration | —Unverified | 0 |
| Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation | Apr 30, 2024 | Caption GenerationHallucination | —Unverified | 0 |
| WAT2019: English-Hindi Translation on Hindi Visual Genome Dataset | Nov 1, 2019 | Caption GenerationTranslation | —Unverified | 0 |
| Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models | Sep 10, 2020 | Caption GenerationDenoising | —Unverified | 0 |
| Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching | May 18, 2021 | Caption GenerationCross-Modal Retrieval | —Unverified | 0 |
| What is not where: the challenge of integrating spatial representations into deep learning architectures | Jul 21, 2018 | Caption GenerationDeep Learning | —Unverified | 0 |
| Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned | Sep 26, 2022 | Caption GenerationSemantic Similarity | —Unverified | 0 |
| XMeCap: Meme Caption Generation with Sub-Image Adaptability | Jul 24, 2024 | Caption GenerationMeme Captioning | —Unverified | 0 |
| LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation | Oct 18, 2023 | Caption GenerationInstruction Following | —Unverified | 0 |
| LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models | Feb 21, 2025 | Caption GenerationVideo Captioning | —Unverified | 0 |
| Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training | Apr 17, 2025 | Caption GenerationHallucination | —Unverified | 0 |
| LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival | Mar 16, 2024 | Caption GenerationImage-text Retrieval | —Unverified | 0 |
| MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning | Dec 13, 2021 | Caption GenerationDescriptive | —Unverified | 0 |
| MAMS: Model-Agnostic Module Selection Framework for Video Captioning | Jan 30, 2025 | Caption GenerationVideo Captioning | —Unverified | 0 |
| MAT: A Multimodal Attentive Translator for Image Captioning | Feb 18, 2017 | Caption GenerationImage Captioning | —Unverified | 0 |
| Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing | Jan 24, 2025 | Caption GenerationDataset Generation | —Unverified | 0 |
| Medical Image Captioning via Generative Pretrained Transformers | Sep 28, 2022 | Caption GenerationDescriptive | —Unverified | 0 |
| MICap: A Unified Model for Identity-aware Movie Descriptions | May 19, 2024 | Caption GenerationDecoder | —Unverified | 0 |
| Mind's Eye: A Recurrent Visual Representation for Image Caption Generation | Jun 1, 2015 | Caption GenerationImage Description | —Unverified | 0 |
| Multilingual Image Corpus – Towards a Multimodal and Multilingual Dataset | Jun 1, 2022 | Caption Generationimage-classification | —Unverified | 0 |
| Multi-modal Dependency Tree for Video Captioning | Dec 1, 2021 | Caption GenerationDependency Parsing | —Unverified | 0 |
| Multi-Modal Generative Embedding Model | May 29, 2024 | Caption GenerationCross-Modal Retrieval | —Unverified | 0 |
| Multimodal Intelligence: Representation Learning, Information Fusion, and Applications | Nov 10, 2019 | Caption GenerationImage Generation | —Unverified | 0 |
| Multi-modal reward for visual relationships-based image captioning | Mar 19, 2023 | Caption GenerationDeep Reinforcement Learning | —Unverified | 0 |
| Multi-Similarity Contrastive Learning | Jul 6, 2023 | Caption GenerationContrastive Learning | —Unverified | 0 |
| Multi-task Sequence to Sequence Learning | Nov 19, 2015 | Caption GenerationDecoder | —Unverified | 0 |