| Everything is a Video: Unifying Modalities through Next-Frame Prediction | Nov 15, 2024 | Caption GenerationCross-Modal Retrieval | —Unverified | 0 | 0 |
| Examining the Effects of Language-and-Vision Data Augmentation for Generation of Descriptions of Human Faces | Jun 1, 2022 | Caption GenerationData Augmentation | —Unverified | 0 | 0 |
| Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention | Jun 28, 2024 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer | Sep 17, 2024 | Audio GenerationCaption Generation | —Unverified | 0 | 0 |
| FaceGemma: Enhancing Image Captioning with Facial Attributes for Portrait Images | Sep 24, 2023 | AttributeCaption Generation | —Unverified | 0 | 0 |
| Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech | May 31, 2018 | Caption GenerationDiversity | —Unverified | 0 | 0 |
| Fast Image Caption Generation with Position Alignment | Dec 13, 2019 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| Feature Fusion Effects of Tensor Product Representation on (De)Compositional Network for Caption Generation for Images | Dec 17, 2018 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation | May 22, 2024 | Caption GenerationHallucination | —Unverified | 0 | 0 |
| FE-LWS: Refined Image-Text Representations via Decoder Stacking and Fused Encodings for Remote Sensing Image Captioning | Feb 13, 2025 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| Fine-Grained Video Captioning through Scene Graph Consolidation | Feb 23, 2025 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Fusion Models for Improved Visual Captioning | Oct 28, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance | May 25, 2025 | Caption GenerationQuestion Answering | —Unverified | 0 | 0 |
| GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning | Oct 12, 2024 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| Generating captions without looking beyond objects | Oct 12, 2016 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Generating Image Captions in Arabic using Root-Word Based Recurrent Neural Networks and Deep Neural Networks | Jun 1, 2018 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Generating image captions with external encyclopedic knowledge | Oct 10, 2022 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Generating Video Description using Sequence-to-sequence Model with Temporal Attention | Dec 1, 2016 | Caption GenerationSentence | —Unverified | 0 | 0 |
| Geo-Aware Image Caption Generation | Dec 1, 2020 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Geometry-Entangled Visual Semantic Transformer for Image Captioning | Sep 29, 2021 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| GNNFormer: A Graph-based Framework for Cytopathology Report Generation | Mar 17, 2023 | Caption GenerationGraph Neural Network | —Unverified | 0 | 0 |
| GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning | Jul 9, 2025 | Caption GenerationClustering | —Unverified | 0 | 0 |
| Goal-driven text descriptions for images | Aug 28, 2021 | AI AgentCaption Generation | —Unverified | 0 | 0 |
| Grounded Video Caption Generation | Nov 12, 2024 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Group-based Distinctive Image Captioning with Memory Difference Encoding and Attention | Apr 3, 2025 | Caption GenerationContrastive Learning | —Unverified | 0 | 0 |
| Guiding Attention using Partial-Order Relationships for Image Captioning | Apr 15, 2022 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Guiding the Long-Short Term Memory Model for Image Caption Generation | Dec 1, 2015 | Caption Generation | —Unverified | 0 | 0 |
| HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning | May 25, 2023 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| Hierarchical LSTMs with Adaptive Attention for Visual Captioning | Dec 26, 2018 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning | Jun 5, 2017 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| I2T2I: Learning Text to Image Synthesis with Textual Data Augmentation | Mar 20, 2017 | Caption GenerationData Augmentation | —Unverified | 0 | 0 |
| IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification | Mar 13, 2025 | Caption Generation | —Unverified | 0 | 0 |
| Identifying Multi-modal Knowledge Neurons in Pretrained Transformers via Two-stage Filtering | Mar 29, 2025 | Caption Generationknowledge editing | —Unverified | 0 | 0 |
| IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers | Nov 27, 2023 | Caption GenerationImage-text Retrieval | —Unverified | 0 | 0 |
| Image Caption Generation for Low-Resource Assamese Language | Nov 1, 2022 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| Image Caption Generation Framework for Assamese News using Attention Mechanism | Dec 1, 2021 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| Image Captioning using Facial Expression and Attention | Aug 8, 2019 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Image Captioning with Integrated Bottom-Up and Multi-level Residual Top-Down Attention for Game Scene Understanding | Jun 16, 2019 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Image Captioning with Unseen Objects | Jul 31, 2019 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Image Position Prediction in Multimodal Documents | May 1, 2020 | ArticlesCaption Generation | —Unverified | 0 | 0 |
| Image Representations and New Domains in Neural Image Captioning | Aug 9, 2015 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Image to Bengali Caption Generation Using Deep CNN and Bidirectional Gated Recurrent Unit | Dec 22, 2020 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| Improving Image Captioning with Better Use of Caption | Jul 1, 2020 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models | Mar 8, 2025 | Caption GenerationQuestion Answering | —Unverified | 0 | 0 |
| Knowledge Distillation for Efficient Audio-Visual Video Captioning | Jun 16, 2023 | Audio-Visual Video CaptioningCaption Generation | —Unverified | 0 | 0 |
| Knowledge driven Description Synthesis for Floor Plan Interpretation | Mar 15, 2021 | Caption GenerationDescriptive | —Unverified | 0 | 0 |
| Language Production Dynamics with Recurrent Neural Networks | Jul 1, 2018 | Caption GenerationLanguage Modeling | —Unverified | 0 | 0 |
| LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images | Mar 20, 2025 | Caption GenerationDiversity | —Unverified | 0 | 0 |
| Learning a Recurrent Visual Representation for Image Caption Generation | Nov 20, 2014 | Caption GenerationImage Retrieval | —Unverified | 0 | 0 |
| Learning from Massive Human Videos for Universal Humanoid Pose Control | Dec 18, 2024 | Caption GenerationHumanoid Control | —Unverified | 0 | 0 |