| WAT2019: English-Hindi Translation on Hindi Visual Genome Dataset | Nov 1, 2019 | Caption GenerationTranslation | —Unverified | 0 |
| Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models | Sep 10, 2020 | Caption GenerationDenoising | —Unverified | 0 |
| Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching | May 18, 2021 | Caption GenerationCross-Modal Retrieval | —Unverified | 0 |
| What is not where: the challenge of integrating spatial representations into deep learning architectures | Jul 21, 2018 | Caption GenerationDeep Learning | —Unverified | 0 |
| Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned | Sep 26, 2022 | Caption GenerationSemantic Similarity | —Unverified | 0 |
| XMeCap: Meme Caption Generation with Sub-Image Adaptability | Jul 24, 2024 | Caption GenerationMeme Captioning | —Unverified | 0 |
| LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation | Oct 18, 2023 | Caption GenerationInstruction Following | —Unverified | 0 |
| LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models | Feb 21, 2025 | Caption GenerationVideo Captioning | —Unverified | 0 |
| Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training | Apr 17, 2025 | Caption GenerationHallucination | —Unverified | 0 |
| LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival | Mar 16, 2024 | Caption GenerationImage-text Retrieval | —Unverified | 0 |
| MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning | Dec 13, 2021 | Caption GenerationDescriptive | —Unverified | 0 |
| MAMS: Model-Agnostic Module Selection Framework for Video Captioning | Jan 30, 2025 | Caption GenerationVideo Captioning | —Unverified | 0 |
| MAT: A Multimodal Attentive Translator for Image Captioning | Feb 18, 2017 | Caption GenerationImage Captioning | —Unverified | 0 |
| Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing | Jan 24, 2025 | Caption GenerationDataset Generation | —Unverified | 0 |
| Medical Image Captioning via Generative Pretrained Transformers | Sep 28, 2022 | Caption GenerationDescriptive | —Unverified | 0 |
| MICap: A Unified Model for Identity-aware Movie Descriptions | May 19, 2024 | Caption GenerationDecoder | —Unverified | 0 |
| Mind's Eye: A Recurrent Visual Representation for Image Caption Generation | Jun 1, 2015 | Caption GenerationImage Description | —Unverified | 0 |
| Multilingual Image Corpus – Towards a Multimodal and Multilingual Dataset | Jun 1, 2022 | Caption Generationimage-classification | —Unverified | 0 |
| Multi-modal Dependency Tree for Video Captioning | Dec 1, 2021 | Caption GenerationDependency Parsing | —Unverified | 0 |
| Multi-Modal Generative Embedding Model | May 29, 2024 | Caption GenerationCross-Modal Retrieval | —Unverified | 0 |
| Multimodal Intelligence: Representation Learning, Information Fusion, and Applications | Nov 10, 2019 | Caption GenerationImage Generation | —Unverified | 0 |
| Multi-modal reward for visual relationships-based image captioning | Mar 19, 2023 | Caption GenerationDeep Reinforcement Learning | —Unverified | 0 |
| Multi-Similarity Contrastive Learning | Jul 6, 2023 | Caption GenerationContrastive Learning | —Unverified | 0 |
| Multi-task Sequence to Sequence Learning | Nov 19, 2015 | Caption GenerationDecoder | —Unverified | 0 |
| Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection | Mar 31, 2016 | Caption GenerationClassification | —Unverified | 0 |