| Everything is a Video: Unifying Modalities through Next-Frame Prediction | Nov 15, 2024 | Caption GenerationCross-Modal Retrieval | —Unverified | 0 | 0 |
| Examining the Effects of Language-and-Vision Data Augmentation for Generation of Descriptions of Human Faces | Jun 1, 2022 | Caption GenerationData Augmentation | —Unverified | 0 | 0 |
| Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention | Jun 28, 2024 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer | Sep 17, 2024 | Audio GenerationCaption Generation | —Unverified | 0 | 0 |
| FaceGemma: Enhancing Image Captioning with Facial Attributes for Portrait Images | Sep 24, 2023 | AttributeCaption Generation | —Unverified | 0 | 0 |
| Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech | May 31, 2018 | Caption GenerationDiversity | —Unverified | 0 | 0 |
| Fast Image Caption Generation with Position Alignment | Dec 13, 2019 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| Feature Fusion Effects of Tensor Product Representation on (De)Compositional Network for Caption Generation for Images | Dec 17, 2018 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation | May 22, 2024 | Caption GenerationHallucination | —Unverified | 0 | 0 |
| FE-LWS: Refined Image-Text Representations via Decoder Stacking and Fused Encodings for Remote Sensing Image Captioning | Feb 13, 2025 | Caption GenerationDecoder | —Unverified | 0 | 0 |