| Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs | Mar 1, 2020 | AttributeCaption Generation | CodeCode Available | 1 | 5 |
| Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension | Oct 18, 2024 | Caption Generation | CodeCode Available | 1 | 5 |
| Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation | Jan 2, 2023 | Caption GenerationInstance Segmentation | CodeCode Available | 1 | 5 |
| Large-scale Pre-training for Grounded Video Caption Generation | Mar 13, 2025 | Caption Generation | CodeCode Available | 1 | 5 |
| SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset | May 12, 2024 | Action SpottingAutomatic Speech Recognition | CodeCode Available | 1 | 5 |
| EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning | Oct 14, 2022 | Caption GenerationKnowledge Distillation | CodeCode Available | 1 | 5 |
| Microsoft COCO Captions: Data Collection and Evaluation Server | Apr 1, 2015 | Caption Generation | CodeCode Available | 1 | 5 |
| End-to-End Dense Video Captioning with Parallel Decoding | Aug 17, 2021 | Caption GenerationDense Video Captioning | CodeCode Available | 1 | 5 |
| Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization | Jun 11, 2021 | Caption GenerationObject | CodeCode Available | 1 | 5 |
| Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds | Apr 22, 2022 | 3D dense captioning3D Object Detection | CodeCode Available | 1 | 5 |
| Multimodal Preference Data Synthetic Alignment with Reward Model | Dec 23, 2024 | 2kCaption Generation | CodeCode Available | 0 | 5 |
| Bivariate Beta-LSTM | May 25, 2019 | Caption GenerationDensity Estimation | CodeCode Available | 0 | 5 |
| Multi-source weak supervision for saliency detection | Apr 1, 2019 | Caption GenerationSaliency Detection | CodeCode Available | 0 | 5 |
| Multi-LLM Collaborative Caption Generation in Scientific Documents | Jan 5, 2025 | Caption GenerationImage to text | CodeCode Available | 0 | 5 |
| DeepDiary: Automatic Caption Generation for Lifelogging Image Streams | Aug 12, 2016 | Caption GenerationImage Captioning | CodeCode Available | 0 | 5 |
| Memeify: A Large-Scale Meme Generation System | Oct 27, 2019 | Caption GenerationDecoder | CodeCode Available | 0 | 5 |
| Local Information Assisted Attention-free Decoder for Audio Captioning | Jan 10, 2022 | Audio captioningCaption Generation | CodeCode Available | 0 | 5 |
| Mol2Lang-VLM: Vision- and Text-Guided Generative Pre-trained Language Models for Advancing Molecule Captioning through Multimodal Fusion | Aug 15, 2024 | Caption GenerationDecoder | CodeCode Available | 0 | 5 |
| Journalistic Guidelines Aware News Image Captioning | Sep 7, 2021 | Caption GenerationDescriptive | CodeCode Available | 0 | 5 |
| An Empirical Study of Language CNN for Image Captioning | Dec 21, 2016 | Caption GenerationImage Captioning | CodeCode Available | 0 | 5 |
| LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation | Sep 4, 2021 | Caption GenerationImage Captioning | CodeCode Available | 0 | 5 |
| Cortico-cerebellar networks as decoupling neural interfaces | Oct 21, 2021 | Caption Generation | CodeCode Available | 0 | 5 |
| Dual-path Collaborative Generation Network for Emotional Video Captioning | Aug 6, 2024 | Caption GenerationVideo Captioning | CodeCode Available | 0 | 5 |
| Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network | Aug 27, 2019 | Caption GenerationDecoder | CodeCode Available | 0 | 5 |
| Bangla Image Caption Generation through CNN-Transformer based Encoder-Decoder Network | Oct 24, 2021 | Caption GenerationDecoder | CodeCode Available | 0 | 5 |