| Controllable Video Captioning with an Exemplar Sentence | Dec 2, 2021 | Caption GenerationDecoder | CodeCode Available | 1 |
| SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning | Nov 25, 2021 | Caption GenerationQuestion Answering | CodeCode Available | 1 |
| Topic Scene Graph Generation by Attention Distillation from Caption | Oct 12, 2021 | Caption GenerationGraph Generation | CodeCode Available | 1 |
| COSMic: A Coherence-Aware Generation Metric for Image Descriptions | Sep 11, 2021 | Caption GenerationImage Captioning | CodeCode Available | 1 |
| End-to-End Dense Video Captioning with Parallel Decoding | Aug 17, 2021 | Caption GenerationDense Video Captioning | CodeCode Available | 1 |
| Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization | Jun 11, 2021 | Caption GenerationObject | CodeCode Available | 1 |
| Connecting What to Say With Where to Look by Modeling Human Attention Traces | May 12, 2021 | Caption GenerationImage Captioning | CodeCode Available | 1 |
| Towards Accurate Text-based Image Captioning with Content Diversity Exploration | Apr 23, 2021 | Caption GenerationDiversity | CodeCode Available | 1 |
| Human-like Controllable Image Captioning with Verb-specific Semantic Roles | Mar 22, 2021 | Caption Generationcontrollable image captioning | CodeCode Available | 1 |
| Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts | Feb 17, 2021 | Caption GenerationDiversity | CodeCode Available | 1 |