| MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response | Sep 15, 2023 | Caption GenerationLanguage Modelling | CodeCode Available | 1 |
| Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning | Sep 6, 2023 | 3D dense captioningCaption Generation | CodeCode Available | 1 |
| ViCo: Engaging Video Comment Generation with Human Preference Rewards | Aug 22, 2023 | Caption GenerationComment Generation | —Unverified | 0 |
| Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning | Aug 22, 2023 | Caption GenerationLarge Language Model | CodeCode Available | 2 |
| Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions | Aug 8, 2023 | Caption GenerationImage Captioning | CodeCode Available | 2 |
| Transferable Decoding with Visual Entities for Zero-Shot Image Captioning | Jul 31, 2023 | Caption GenerationHallucination | CodeCode Available | 1 |
| FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback | Jul 20, 2023 | Caption Generation | CodeCode Available | 0 |
| AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes | Jul 14, 2023 | AttributeCaption Generation | —Unverified | 0 |
| Multi-Similarity Contrastive Learning | Jul 6, 2023 | Caption GenerationContrastive Learning | —Unverified | 0 |
| Knowledge Distillation for Efficient Audio-Visual Video Captioning | Jun 16, 2023 | Audio-Visual Video CaptioningCaption Generation | —Unverified | 0 |
| SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning | Jun 6, 2023 | Caption GenerationImage Captioning | CodeCode Available | 0 |
| CapText: Large Language Model-based Caption Generation From Image Context and Description | Jun 1, 2023 | Caption GenerationImage to text | —Unverified | 0 |
| RealignDiff: Boosting Text-to-Image Diffusion Model with Coarse-to-fine Semantic Re-alignment | May 31, 2023 | Caption GenerationLanguage Modelling | —Unverified | 0 |
| HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning | May 25, 2023 | Caption GenerationDecoder | —Unverified | 0 |
| DiffCap: Exploring Continuous Diffusion on Image Captioning | May 20, 2023 | Caption GenerationDiversity | —Unverified | 0 |
| Efficient Audio Captioning Transformer with Patchout and Text Guidance | Apr 6, 2023 | Audio captioningCaption Generation | —Unverified | 0 |
| Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models | Apr 5, 2023 | Caption GenerationImage Generation | —Unverified | 0 |
| Multi-modal reward for visual relationships-based image captioning | Mar 19, 2023 | Caption GenerationDeep Reinforcement Learning | —Unverified | 0 |
| GNNFormer: A Graph-based Framework for Cytopathology Report Generation | Mar 17, 2023 | Caption GenerationGraph Neural Network | —Unverified | 0 |
| Summaries as Captions: Generating Figure Captions for Scientific Documents with Automated Text Summarization | Feb 23, 2023 | Abstractive Text SummarizationCaption Generation | CodeCode Available | 0 |
| Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning | Feb 8, 2023 | Caption GenerationDecoder | —Unverified | 0 |
| Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning | Feb 4, 2023 | Caption GenerationCoherence Evaluation | CodeCode Available | 0 |
| Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation | Jan 2, 2023 | Caption GenerationInstance Segmentation | CodeCode Available | 1 |
| Uncertainty-Aware Image Captioning | Nov 30, 2022 | Caption GenerationImage Captioning | —Unverified | 0 |
| Retrieval-Augmented Multimodal Language Modeling | Nov 22, 2022 | Caption GenerationImage Captioning | —Unverified | 0 |
| Visual Commonsense-aware Representation Network for Video Captioning | Nov 17, 2022 | Caption GenerationQuestion Answering | CodeCode Available | 1 |
| Event and Entity Extraction from Generated Video Captions | Nov 5, 2022 | Caption GenerationDense Video Captioning | CodeCode Available | 0 |
| Image Caption Generation for Low-Resource Assamese Language | Nov 1, 2022 | Caption GenerationDecoder | —Unverified | 0 |
| EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning | Oct 14, 2022 | Caption GenerationKnowledge Distillation | CodeCode Available | 1 |
| Generating image captions with external encyclopedic knowledge | Oct 10, 2022 | Caption GenerationImage Captioning | —Unverified | 0 |
| REST: REtrieve & Self-Train for generative action recognition | Sep 29, 2022 | Action RecognitionCaption Generation | —Unverified | 0 |
| Medical Image Captioning via Generative Pretrained Transformers | Sep 28, 2022 | Caption GenerationDescriptive | —Unverified | 0 |
| Word to Sentence Visual Semantic Similarity for Caption Generation: Lessons Learned | Sep 26, 2022 | Caption GenerationSemantic Similarity | —Unverified | 0 |
| Belief Revision based Caption Re-ranker with Visual Semantic Information | Sep 16, 2022 | Caption GenerationImage Captioning | CodeCode Available | 1 |
| Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using Patches | Jun 30, 2022 | Caption GenerationVideo Captioning | CodeCode Available | 1 |
| Examining the Effects of Language-and-Vision Data Augmentation for Generation of Descriptions of Human Faces | Jun 1, 2022 | Caption GenerationData Augmentation | —Unverified | 0 |
| Aligning Images and Text with Semantic Role Labels for Fine-Grained Cross-Modal Understanding | Jun 1, 2022 | Caption GenerationImage Retrieval | —Unverified | 0 |
| Multilingual Image Corpus – Towards a Multimodal and Multilingual Dataset | Jun 1, 2022 | Caption Generationimage-classification | —Unverified | 0 |
| Fine-grained Image Captioning with CLIP Reward | May 26, 2022 | Caption GenerationDescriptive | CodeCode Available | 2 |
| GL-RG: Global-Local Representation Granularity for Video Captioning | May 22, 2022 | Caption GenerationDescriptive | CodeCode Available | 1 |
| Automated Audio Captioning: An Overview of Recent Progress and New Challenges | May 12, 2022 | Audio captioningCaption Generation | —Unverified | 0 |
| Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds | Apr 22, 2022 | 3D dense captioning3D Object Detection | CodeCode Available | 1 |
| Guiding Attention using Partial-Order Relationships for Image Captioning | Apr 15, 2022 | Caption GenerationImage Captioning | —Unverified | 0 |
| NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models | Mar 29, 2022 | Caption Generation | CodeCode Available | 0 |
| NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge | Mar 28, 2022 | Caption GenerationObject | —Unverified | 0 |
| A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism | Mar 3, 2022 | Caption GenerationDecoder | —Unverified | 0 |
| Deep Learning Approaches on Image Captioning: A Review | Jan 31, 2022 | Caption GenerationDeep Learning | —Unverified | 0 |
| Local Information Assisted Attention-free Decoder for Audio Captioning | Jan 10, 2022 | Audio captioningCaption Generation | CodeCode Available | 0 |
| MAGIC: Multimodal relAtional Graph adversarIal inferenCe for Diverse and Unpaired Text-based Image Captioning | Dec 13, 2021 | Caption GenerationDescriptive | —Unverified | 0 |
| Injecting Semantic Concepts into End-to-End Image Captioning | Dec 9, 2021 | Caption GenerationImage Captioning | CodeCode Available | 1 |