| Self-supervised Cross-view Representation Reconstruction for Change Captioning | Sep 28, 2023 | Caption GenerationHallucination | CodeCode Available | 1 |
| Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension | Oct 18, 2024 | Caption Generation | CodeCode Available | 1 |
| GL-RG: Global-Local Representation Granularity for Video Captioning | May 22, 2022 | Caption GenerationDescriptive | CodeCode Available | 1 |
| SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset | May 12, 2024 | Action SpottingAutomatic Speech Recognition | CodeCode Available | 1 |
| Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data | Oct 2, 2024 | Audio ClassificationCaption Generation | CodeCode Available | 1 |
| Injecting Semantic Concepts into End-to-End Image Captioning | Dec 9, 2021 | Caption GenerationImage Captioning | CodeCode Available | 1 |
| Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization | Jun 11, 2021 | Caption GenerationObject | CodeCode Available | 1 |
| Topic Scene Graph Generation by Attention Distillation from Caption | Oct 12, 2021 | Caption GenerationGraph Generation | CodeCode Available | 1 |
| Transferable Decoding with Visual Entities for Zero-Shot Image Captioning | Jul 31, 2023 | Caption GenerationHallucination | CodeCode Available | 1 |
| Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning | Sep 6, 2023 | 3D dense captioningCaption Generation | CodeCode Available | 1 |