| GNN-ViTCap: GNN-Enhanced Multiple Instance Learning with Vision Transformers for Whole Slide Image Classification and Captioning | Jul 9, 2025 | Caption GenerationClustering | —Unverified | 0 |
| LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images | Mar 20, 2025 | Caption GenerationDiversity | —Unverified | 0 |
| Automated Audio Captioning: An Overview of Recent Progress and New Challenges | May 12, 2022 | Audio captioningCaption Generation | —Unverified | 0 |
| Knowledge driven Description Synthesis for Floor Plan Interpretation | Mar 15, 2021 | Caption GenerationDescriptive | —Unverified | 0 |
| Efficient Audio Captioning Transformer with Patchout and Text Guidance | Apr 6, 2023 | Audio captioningCaption Generation | —Unverified | 0 |
| EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits | Jun 11, 2025 | Artifact DetectionCaption Generation | —Unverified | 0 |
| Common Subspace for Model and Similarity: Phrase Learning for Caption Generation From Images | Dec 1, 2015 | Caption GenerationDescriptive | —Unverified | 0 |
| Language Production Dynamics with Recurrent Neural Networks | Jul 1, 2018 | Caption GenerationLanguage Modeling | —Unverified | 0 |
| LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation | Oct 18, 2023 | Caption GenerationInstruction Following | —Unverified | 0 |
| Clue: Cross-modal Coherence Modeling for Caption Generation | May 2, 2020 | Caption Generationcontrollable image captioning | —Unverified | 0 |
| DS@BioMed at ImageCLEFmedical Caption 2024: Enhanced Attention Mechanisms in Medical Caption Generation through Concept Detection Integration | Jun 1, 2024 | Caption GenerationImage Captioning | —Unverified | 0 |
| Domain Adaptation for Neural Networks by Parameter Augmentation | Jul 1, 2016 | Caption GenerationDomain Adaptation | —Unverified | 0 |
| Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SCICAP Challenge 2023 | Jan 31, 2025 | ArticlesCaption Generation | —Unverified | 0 |
| Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? | Jun 20, 2024 | Caption GenerationHallucination | —Unverified | 0 |
| Image Captioning using Facial Expression and Attention | Aug 8, 2019 | Caption GenerationImage Captioning | —Unverified | 0 |
| Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation | Jun 3, 2025 | Caption GenerationImage Captioning | —Unverified | 0 |
| Image Caption Generation Framework for Assamese News using Attention Mechanism | Dec 1, 2021 | Caption GenerationDecoder | —Unverified | 0 |
| Auto-ACD: A Large-scale Dataset for Audio-Language Representation Learning | Sep 20, 2023 | Audio captioningCaption Generation | —Unverified | 0 |
| Image Caption Generation for Low-Resource Assamese Language | Nov 1, 2022 | Caption GenerationDecoder | —Unverified | 0 |
| IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers | Nov 27, 2023 | Caption GenerationImage-text Retrieval | —Unverified | 0 |
| Chittron: An Automatic Bangla Image Captioning System | Sep 2, 2018 | Caption GenerationImage Captioning | —Unverified | 0 |
| Image to Bengali Caption Generation Using Deep CNN and Bidirectional Gated Recurrent Unit | Dec 22, 2020 | Caption GenerationDecoder | —Unverified | 0 |
| Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space | Nov 19, 2017 | Caption GenerationImage Description | —Unverified | 0 |
| Image Captioning with Integrated Bottom-Up and Multi-level Residual Top-Down Attention for Game Scene Understanding | Jun 16, 2019 | Caption GenerationImage Captioning | —Unverified | 0 |
| Improving Image Captioning with Better Use of Caption | Jul 1, 2020 | Caption GenerationImage Captioning | —Unverified | 0 |