| Large-scale Pre-training for Grounded Video Caption Generation | Mar 13, 2025 | Caption Generation | CodeCode Available | 1 |
| Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation | Aug 17, 2016 | Caption GenerationDecoder | CodeCode Available | 1 |
| End-to-End Dense Video Captioning with Parallel Decoding | Aug 17, 2021 | Caption GenerationDense Video Captioning | CodeCode Available | 1 |
| Connecting What to Say With Where to Look by Modeling Human Attention Traces | May 12, 2021 | Caption GenerationImage Captioning | CodeCode Available | 1 |
| Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning | Jul 16, 2024 | Caption Generationcross-modal alignment | CodeCode Available | 1 |
| EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning | Oct 14, 2022 | Caption GenerationKnowledge Distillation | CodeCode Available | 1 |
| Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension | Oct 18, 2024 | Caption Generation | CodeCode Available | 1 |
| Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts | Feb 17, 2021 | Caption GenerationDiversity | CodeCode Available | 1 |
| Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks | Oct 30, 2017 | 3D Action RecognitionAction Recognition | CodeCode Available | 1 |
| Controllable Video Captioning with an Exemplar Sentence | Dec 2, 2021 | Caption GenerationDecoder | CodeCode Available | 1 |