| Guiding the Long-Short Term Memory Model for Image Caption Generation | Dec 1, 2015 | Caption Generation | —Unverified | 0 |
| D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding | Dec 2, 2021 | 3D dense captioning3D visual grounding | —Unverified | 0 |
| Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning | Jun 5, 2017 | Caption GenerationDecoder | —Unverified | 0 |
| Fusion Models for Improved Visual Captioning | Oct 28, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance | May 25, 2025 | Caption GenerationQuestion Answering | —Unverified | 0 |
| GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning | Oct 12, 2024 | Caption GenerationDecoder | —Unverified | 0 |
| Generating captions without looking beyond objects | Oct 12, 2016 | Caption GenerationImage Captioning | —Unverified | 0 |
| Generating Image Captions in Arabic using Root-Word Based Recurrent Neural Networks and Deep Neural Networks | Jun 1, 2018 | Caption GenerationImage Captioning | —Unverified | 0 |
| Generating image captions with external encyclopedic knowledge | Oct 10, 2022 | Caption GenerationImage Captioning | —Unverified | 0 |
| Empirical Analysis of Image Caption Generation using Deep Learning | May 14, 2021 | Caption GenerationDecoder | —Unverified | 0 |