| CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving | Aug 19, 2024 | Autonomous DrivingCaption Generation | —Unverified | 0 |
| Benchmarking Multimodal Models for Ukrainian Language Understanding Across Academic and Cultural Domains | Nov 22, 2024 | BenchmarkingCaption Generation | —Unverified | 0 |
| A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism | Mar 3, 2022 | Caption GenerationDecoder | —Unverified | 0 |
| Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention | Jun 28, 2024 | Caption GenerationDecoder | —Unverified | 0 |
| Analysis of Convolutional Decoder for Image Caption Generation | Mar 8, 2021 | Caption GenerationData Augmentation | —Unverified | 0 |
| Controlled Caption Generation for Images Through Adversarial Attacks | Jul 7, 2021 | Caption GenerationImage Captioning | —Unverified | 0 |
| Evaluation of Automatic Video Captioning Using Direct Assessment | Oct 29, 2017 | Caption GenerationMachine Translation | —Unverified | 0 |
| 3G structure for image caption generation | Apr 21, 2019 | Caption GenerationSentence | —Unverified | 0 |
| Geo-Aware Image Caption Generation | Dec 1, 2020 | Caption GenerationImage Captioning | —Unverified | 0 |
| Geometry-Entangled Visual Semantic Transformer for Image Captioning | Sep 29, 2021 | Caption GenerationImage Captioning | —Unverified | 0 |
| GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning | Jul 9, 2025 | Caption GenerationClustering | —Unverified | 0 |
| GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning | Oct 12, 2024 | Caption GenerationDecoder | —Unverified | 0 |
| Entity-aware Image Caption Generation | Apr 21, 2018 | Caption GenerationImage Captioning | —Unverified | 0 |
| Enhancing Image Captioning with Neural Models | Dec 1, 2023 | Caption GenerationImage Captioning | —Unverified | 0 |
| Generating captions without looking beyond objects | Oct 12, 2016 | Caption GenerationImage Captioning | —Unverified | 0 |
| Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback | Mar 11, 2024 | Caption Generationreinforcement-learning | —Unverified | 0 |
| A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation | Oct 11, 2023 | Caption GenerationDecoder | —Unverified | 0 |
| Error Causal inference for Multi-Fusion models | Jun 1, 2021 | Caption GenerationCausal Inference | —Unverified | 0 |
| GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance | May 25, 2025 | Caption GenerationQuestion Answering | —Unverified | 0 |
| Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning | Feb 19, 2025 | Caption GenerationClassification | —Unverified | 0 |
| Everything is a Video: Unifying Modalities through Next-Frame Prediction | Nov 15, 2024 | Caption GenerationCross-Modal Retrieval | —Unverified | 0 |
| Examining the Effects of Language-and-Vision Data Augmentation for Generation of Descriptions of Human Faces | Jun 1, 2022 | Caption GenerationData Augmentation | —Unverified | 0 |
| Cortico-cerebellar networks as decoupled neural interfaces | Jan 1, 2021 | Caption Generation | —Unverified | 0 |
| End to End Recognition System for Recognizing Offline Unconstrained Vietnamese Handwriting | May 14, 2019 | Caption GenerationDecoder | —Unverified | 0 |
| Fusion Models for Improved Visual Captioning | Oct 28, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |