| Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present | Mar 30, 2018 | Caption GenerationDecoder | CodeCode Available | 0 | 5 |
| Twin Networks: Matching the Future for Sequence Generation | Aug 22, 2017 | Caption Generationspeech-recognition | CodeCode Available | 0 | 5 |
| R^3Net:Relation-embedded Representation Reconstruction Network for Change Captioning | Oct 20, 2021 | Caption GenerationRelation | CodeCode Available | 0 | 5 |
| Discriminability objective for training descriptive captions | Mar 12, 2018 | Caption GenerationDescriptive | CodeCode Available | 0 | 5 |
| Pre-gen metrics: Predicting caption quality metrics without generating captions | Oct 12, 2018 | Caption Generation | CodeCode Available | 0 | 5 |
| Rˆ3Net:Relation-embedded Representation Reconstruction Network for Change Captioning | Nov 1, 2021 | Caption GenerationRelation | CodeCode Available | 0 | 5 |
| Multi-source weak supervision for saliency detection | Apr 1, 2019 | Caption GenerationSaliency Detection | CodeCode Available | 0 | 5 |
| Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning | Jun 15, 2024 | Caption Generation | CodeCode Available | 0 | 5 |
| Multimodal Preference Data Synthetic Alignment with Reward Model | Dec 23, 2024 | 2kCaption Generation | CodeCode Available | 0 | 5 |
| Recurrent Neural Network Regularization | Sep 8, 2014 | Caption GenerationImage Captioning | CodeCode Available | 0 | 5 |
| Mol2Lang-VLM: Vision- and Text-Guided Generative Pre-trained Language Models for Advancing Molecule Captioning through Multimodal Fusion | Aug 15, 2024 | Caption GenerationDecoder | CodeCode Available | 0 | 5 |
| Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning | Dec 6, 2017 | Caption GenerationDecoder | CodeCode Available | 0 | 5 |
| NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models | Mar 29, 2022 | Caption Generation | CodeCode Available | 0 | 5 |
| Local Information Assisted Attention-free Decoder for Audio Captioning | Jan 10, 2022 | Audio captioningCaption Generation | CodeCode Available | 0 | 5 |
| Guiding Long-Short Term Memory for Image Caption Generation | Sep 16, 2015 | Caption Generation | CodeCode Available | 0 | 5 |
| LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation | Sep 4, 2021 | Caption GenerationImage Captioning | CodeCode Available | 0 | 5 |
| 3D CoCa: Contrastive Learners are 3D Captioners | Apr 13, 2025 | 3D dense captioningCaption Generation | CodeCode Available | 0 | 5 |
| Journalistic Guidelines Aware News Image Captioning | Sep 7, 2021 | Caption GenerationDescriptive | CodeCode Available | 0 | 5 |
| Memeify: A Large-Scale Meme Generation System | Oct 27, 2019 | Caption GenerationDecoder | CodeCode Available | 0 | 5 |
| Event and Entity Extraction from Generated Video Captions | Nov 5, 2022 | Caption GenerationDense Video Captioning | CodeCode Available | 0 | 5 |
| GNNFormer: A Graph-based Framework for Cytopathology Report Generation | Mar 17, 2023 | Caption GenerationGraph Neural Network | —Unverified | 0 | 0 |
| Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models | Sep 17, 2021 | Caption GenerationDenoising | —Unverified | 0 | 0 |
| Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models | Nov 18, 2019 | Anomaly DetectionAutonomous Driving | —Unverified | 0 | 0 |
| Geometry-Entangled Visual Semantic Transformer for Image Captioning | Sep 29, 2021 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Geo-Aware Image Caption Generation | Dec 1, 2020 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning | Jul 9, 2025 | Caption GenerationClustering | —Unverified | 0 | 0 |
| Generating Video Description using Sequence-to-sequence Model with Temporal Attention | Dec 1, 2016 | Caption GenerationSentence | —Unverified | 0 | 0 |
| Generating image captions with external encyclopedic knowledge | Oct 10, 2022 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Deep Learning Approaches on Image Captioning: A Review | Jan 31, 2022 | Caption GenerationDeep Learning | —Unverified | 0 | 0 |
| VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools | Oct 16, 2023 | Caption GenerationDescriptive | —Unverified | 0 | 0 |
| End-to-End Video Captioning | Apr 4, 2019 | Action RecognitionCaption Generation | —Unverified | 0 | 0 |
| Generating Image Captions in Arabic using Root-Word Based Recurrent Neural Networks and Deep Neural Networks | Jun 1, 2018 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Generating captions without looking beyond objects | Oct 12, 2016 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| GEM-VPC: A dual Graph-Enhanced Multimodal integration for Video Paragraph Captioning | Oct 12, 2024 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance | May 25, 2025 | Caption GenerationQuestion Answering | —Unverified | 0 | 0 |
| Deep Bayesian Natural Language Processing | Jul 1, 2019 | Caption GenerationClustering | —Unverified | 0 | 0 |
| Bi-directional Contextual Attention for 3D Dense Captioning | Aug 13, 2024 | 3D dense captioningAttribute | —Unverified | 0 | 0 |
| Fusion Models for Improved Visual Captioning | Oct 28, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism | Nov 25, 2023 | Caption GenerationDenoising | —Unverified | 0 | 0 |
| D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding | Dec 2, 2021 | 3D dense captioning3D visual grounding | —Unverified | 0 | 0 |
| BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving | Jan 2, 2024 | Autonomous DrivingCaption Generation | —Unverified | 0 | 0 |
| An encoder-decoder based framework for hindi image caption generation | Jul 9, 2021 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation | Jan 18, 2024 | Caption GenerationLanguage Modeling | —Unverified | 0 | 0 |
| Fine-Grained Video Captioning through Scene Graph Consolidation | Feb 23, 2025 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Cross-modal Coherence Modeling for Caption Generation | Jul 1, 2020 | Caption Generationcontrollable image captioning | —Unverified | 0 | 0 |
| FE-LWS: Refined Image-Text Representations via Decoder Stacking and Fused Encodings for Remote Sensing Image Captioning | Feb 13, 2025 | Caption GenerationDecoder | —Unverified | 0 | 0 |
| Cross-Lingual Image Caption Generation | Aug 1, 2016 | Caption GenerationDependency Parsing | —Unverified | 0 | 0 |
| Less for More: Enhanced Feedback-aligned Mixed LLMs for Molecule Caption Generation and Fine-Grained NLI Evaluation | May 22, 2024 | Caption GenerationHallucination | —Unverified | 0 | 0 |
| Feature Fusion Effects of Tensor Product Representation on (De)Compositional Network for Caption Generation for Images | Dec 17, 2018 | Caption GenerationImage Captioning | —Unverified | 0 | 0 |
| Fast Image Caption Generation with Position Alignment | Dec 13, 2019 | Caption GenerationDecoder | —Unverified | 0 | 0 |