| Describing Multimedia Content using Attention-based Encoder--Decoder Networks | Jul 4, 2015 | Caption GenerationDecoder | —Unverified | 0 |
| Describing Natural Images Containing Novel Objects with Knowledge Guided Assitance | Oct 17, 2017 | Caption Generation | —Unverified | 0 |
| Caption Generation on Scenes with Seen and Unseen Object Categories | Aug 13, 2021 | Caption GenerationLanguage Modelling | —Unverified | 0 |
| DiffCap: Exploring Continuous Diffusion on Image Captioning | May 20, 2023 | Caption GenerationDiversity | —Unverified | 0 |
| DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding | Dec 2, 2024 | Caption GenerationDomain Generalization | —Unverified | 0 |
| Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space | Nov 19, 2017 | Caption GenerationImage Description | —Unverified | 0 |
| Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? | Jun 20, 2024 | Caption GenerationHallucination | —Unverified | 0 |
| Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SCICAP Challenge 2023 | Jan 31, 2025 | ArticlesCaption Generation | —Unverified | 0 |
| Domain Adaptation for Neural Networks by Parameter Augmentation | Jul 1, 2016 | Caption GenerationDomain Adaptation | —Unverified | 0 |
| DS@BioMed at ImageCLEFmedical Caption 2024: Enhanced Attention Mechanisms in Medical Caption Generation through Concept Detection Integration | Jun 1, 2024 | Caption GenerationImage Captioning | —Unverified | 0 |
| EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits | Jun 11, 2025 | Artifact DetectionCaption Generation | —Unverified | 0 |
| Efficient Audio Captioning Transformer with Patchout and Text Guidance | Apr 6, 2023 | Audio captioningCaption Generation | —Unverified | 0 |
| E-MMAD: Multimodal Advertising Caption Generation Based on Structured Information | Nov 16, 2021 | Caption Generationvalid | —Unverified | 0 |
| Empirical Analysis of Image Caption Generation using Deep Learning | May 14, 2021 | Caption GenerationDecoder | —Unverified | 0 |
| End to End Recognition System for Recognizing Offline Unconstrained Vietnamese Handwriting | May 14, 2019 | Caption GenerationDecoder | —Unverified | 0 |
| Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning | Feb 19, 2025 | Caption GenerationClassification | —Unverified | 0 |
| Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback | Mar 11, 2024 | Caption Generationreinforcement-learning | —Unverified | 0 |
| Enhancing Image Captioning with Neural Models | Dec 1, 2023 | Caption GenerationImage Captioning | —Unverified | 0 |
| Entity-aware Image Caption Generation | Apr 21, 2018 | Caption GenerationImage Captioning | —Unverified | 0 |
| Error Causal inference for Multi-Fusion models | Jun 1, 2021 | Caption GenerationCausal Inference | —Unverified | 0 |
| Evaluation of Automatic Video Captioning Using Direct Assessment | Oct 29, 2017 | Caption GenerationMachine Translation | —Unverified | 0 |
| Everything is a Video: Unifying Modalities through Next-Frame Prediction | Nov 15, 2024 | Caption GenerationCross-Modal Retrieval | —Unverified | 0 |
| Neural Attention Models for Sequence Classification: Analysis and Application to Key Term Extraction and Dialogue Act Detection | Mar 31, 2016 | Caption GenerationClassification | —Unverified | 0 |
| Neural Caption Generation for News Images | May 1, 2018 | Caption GenerationImage Captioning | —Unverified | 0 |
| NEXT: Multi-Grained Mixture of Experts via Text-Modulation for Multi-Modal Object Re-ID | May 26, 2025 | AttributeCaption Generation | —Unverified | 0 |
| NLPHut’s Participation at WAT2021 | Aug 1, 2021 | Caption GenerationImage Captioning | —Unverified | 0 |
| NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge | Mar 28, 2022 | Caption GenerationObject | —Unverified | 0 |
| O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning | Aug 5, 2021 | AttributeCaption Generation | —Unverified | 0 |
| OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts | Jul 22, 2017 | Caption GenerationDescriptive | —Unverified | 0 |
| PathM3: A Multimodal Multi-Task Multiple Instance Learning Framework for Whole Slide Image Classification and Captioning | Mar 13, 2024 | Caption GenerationDiagnostic | —Unverified | 0 |
| Predicting the Mumble of Wireless Channel with Sequence-to-Sequence Models | Jan 14, 2019 | Caption GenerationLanguage Modeling | —Unverified | 0 |
| Relationship-based Neural Baby Talk | Mar 8, 2021 | Caption GenerationGraph Attention | —Unverified | 0 |
| REST: REtrieve & Self-Train for generative action recognition | Sep 29, 2022 | Action RecognitionCaption Generation | —Unverified | 0 |
| Rethinking the Form of Latent States in Image Captioning | Jul 26, 2018 | Caption GenerationForm | —Unverified | 0 |
| Retrieval-Augmented Multimodal Language Modeling | Nov 22, 2022 | Caption GenerationImage Captioning | —Unverified | 0 |
| Review Networks for Caption Generation | May 25, 2016 | Caption GenerationDecoder | —Unverified | 0 |
| RUC+CMU: System Report for Dense Captioning Events in Videos | Jun 22, 2018 | Caption GenerationDense Captioning | —Unverified | 0 |
| Scene-based Factored Attention for Image Captioning | Aug 7, 2019 | Caption GenerationDecoder | —Unverified | 0 |
| Scene Graph Generation for Better Image Captioning? | Sep 23, 2021 | Caption GenerationGraph Generation | —Unverified | 0 |
| Scene Understanding for Autonomous Manipulation with Deep Learning | Mar 23, 2019 | Action UnderstandingAffordance Detection | —Unverified | 0 |
| See It All: Contextualized Late Aggregation for 3D Dense Captioning | Aug 14, 2024 | 3D dense captioningAll | —Unverified | 0 |
| Seq2Mol: Automatic design of de novo molecules conditioned by the target protein sequences through deep neural networks | Oct 29, 2020 | Caption GenerationLanguage Modelling | —Unverified | 0 |
| Sequence to Sequence - Video to Text | Dec 1, 2015 | Caption GenerationLanguage Modeling | —Unverified | 0 |
| Set Prediction Guided by Semantic Concepts for Diverse Video Captioning | Dec 25, 2023 | Caption GenerationDiversity | —Unverified | 0 |
| Simultaneous Segmentation and Recognition: Towards more accurate Ego Gesture Recognition | Sep 18, 2019 | Activity RecognitionCaption Generation | —Unverified | 0 |
| Skip-Gram − Zipf + Uniform = Vector Additivity | Jul 1, 2017 | Caption GenerationDimensionality Reduction | —Unverified | 0 |
| Social Media Ready Caption Generation for Brands | Jan 3, 2024 | Caption GenerationImage Captioning | —Unverified | 0 |
| Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection | Feb 18, 2017 | Caption GenerationEvent Detection | —Unverified | 0 |
| Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning | Feb 27, 2019 | AttributeCaption Generation | —Unverified | 0 |
| Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning | Feb 8, 2023 | Caption GenerationDecoder | —Unverified | 0 |