| Analysis of Convolutional Decoder for Image Caption Generation | Mar 8, 2021 | Caption GenerationData Augmentation | —Unverified | 0 | 0 |
| An encoder-decoder based framework for hindi image caption generation | Jul 9, 2021 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| End-to-End Video Captioning | Apr 4, 2019 | Action RecognitionCaption Generation | —Unverified | 0 | 0 |
| A Thorough Review on Recent Deep Learning Methodologies for Image Captioning | Jul 28, 2021 | Caption GenerationDescriptive | —Unverified | 0 | 0 |
| Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation | Jun 3, 2025 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Automated Audio Captioning: An Overview of Recent Progress and New Challenges | May 12, 2022 | Audio captioningCaption Generation | —Unverified | 0 | 0 |
| Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains | Nov 22, 2024 | BenchmarkingCaption Generation | —Unverified | 0 | 0 |
| BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving | Jan 2, 2024 | Autonomous DrivingCaption Generation | —Unverified | 0 | 0 |
| Bi-directional Contextual Attention for 3D Dense Captioning | Aug 13, 2024 | 3D dense captioningAttribute | —Unverified | 0 | 0 |
| VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools | Oct 16, 2023 | Caption GenerationDescriptive | —Unverified | 0 | 0 |
| RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment | May 31, 2023 | Caption GenerationLanguage Modelling | —Unverified | 0 | 0 |
| Bringing back simplicity and lightliness into neural image captioning | Oct 15, 2018 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| CapText: Large Language Model-based Caption Generation From Image Context and Description | Jun 1, 2023 | Caption GenerationImage to text | —Unverified | 0 | 0 |
| Caption Generation of Robot Behaviors based on Unsupervised Learning of Action Segments | Mar 23, 2020 | Caption GenerationChunking | —Unverified | 0 | 0 |
| Chittron: An Automatic Bangla Image Captioning System | Sep 2, 2018 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Clue: Cross-modal Coherence Modeling for Caption Generation | May 2, 2020 | Caption Generationcontrollable image captioning | —Unverified | 0 | 0 |
| Common Subspace for Model and Similarity: Phrase Learning for Caption Generation From Images | Dec 1, 2015 | Caption GenerationDescriptive | —Unverified | 0 | 0 |
| Controlled Caption Generation for Images Through Adversarial Attacks | Jul 7, 2021 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Cortico-cerebellar networks as decoupled neural interfaces | Jan 1, 2021 | Caption Generation | —Unverified | 0 | 0 |
| CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving | Aug 19, 2024 | Autonomous DrivingCaption Generation | —Unverified | 0 | 0 |
| Cross-Lingual Image Caption Generation | Aug 1, 2016 | Caption GenerationDependency Parsing | —Unverified | 0 | 0 |
| Cross-modal Coherence Modeling for Caption Generation | Jul 1, 2020 | Caption Generationcontrollable image captioning | —Unverified | 0 | 0 |
| D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding | Dec 2, 2021 | 3D dense captioning3D visual grounding | —Unverified | 0 | 0 |
| DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism | Nov 25, 2023 | Caption GenerationDenoising | —Unverified | 0 | 0 |
| Deep Bayesian Natural Language Processing | Jul 1, 2019 | Caption GenerationClustering | —Unverified | 0 | 0 |