| Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks | Oct 30, 2017 | 3D Action RecognitionAction Recognition | CodeCode Available | 1 | 5 |
| GL-RG: Global-Local Representation Granularity for Video Captioning | May 22, 2022 | Caption GenerationDescriptive | CodeCode Available | 1 | 5 |
| HCQA @ Ego4D EgoSchema Challenge 2024 | Jun 22, 2024 | Caption Generation | CodeCode Available | 1 | 5 |
| Belief Revision based Caption Re-ranker with Visual Semantic Information | Sep 16, 2022 | Caption GenerationImage Captioning | CodeCode Available | 1 | 5 |
| Controllable Video Captioning with an Exemplar Sentence | Dec 2, 2021 | Caption GenerationDecoder | CodeCode Available | 1 | 5 |
| Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation | Jan 2, 2023 | Caption GenerationInstance Segmentation | CodeCode Available | 1 | 5 |
| Connecting What to Say With Where to Look by Modeling Human Attention Traces | May 12, 2021 | Caption GenerationImage Captioning | CodeCode Available | 1 | 5 |
| COSMic: A Coherence-Aware Generation Metric for Image Descriptions | Sep 11, 2021 | Caption GenerationImage Captioning | CodeCode Available | 1 | 5 |
| Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts | Feb 17, 2021 | Caption GenerationDiversity | CodeCode Available | 1 | 5 |
| Human-like Controllable Image Captioning with Verb-specific Semantic Roles | Mar 22, 2021 | Caption Generationcontrollable image captioning | CodeCode Available | 1 | 5 |