| NeuSyRE: Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment | Nov 5, 2023 | Caption GenerationCommon Sense Reasoning | CodeCode Available | 1 |
| VLIS: Unimodal Language Models Guide Multimodal Language Generation | Oct 15, 2023 | Caption GenerationExplanation Generation | CodeCode Available | 1 |
| Self-supervised Cross-view Representation Reconstruction for Change Captioning | Sep 28, 2023 | Caption GenerationHallucination | CodeCode Available | 1 |
| RECAP: Retrieval-Augmented Audio Captioning | Sep 18, 2023 | AudioCapsAudio captioning | CodeCode Available | 1 |
| MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response | Sep 15, 2023 | Caption GenerationLanguage Modelling | CodeCode Available | 1 |
| Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning | Sep 6, 2023 | 3D dense captioningCaption Generation | CodeCode Available | 1 |
| Transferable Decoding with Visual Entities for Zero-Shot Image Captioning | Jul 31, 2023 | Caption GenerationHallucination | CodeCode Available | 1 |
| Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation | Jan 2, 2023 | Caption GenerationInstance Segmentation | CodeCode Available | 1 |
| Visual Commonsense-aware Representation Network for Video Captioning | Nov 17, 2022 | Caption GenerationQuestion Answering | CodeCode Available | 1 |
| EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning | Oct 14, 2022 | Caption GenerationKnowledge Distillation | CodeCode Available | 1 |
| Belief Revision based Caption Re-ranker with Visual Semantic Information | Sep 16, 2022 | Caption GenerationImage Captioning | CodeCode Available | 1 |
| Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using Patches | Jun 30, 2022 | Caption GenerationVideo Captioning | CodeCode Available | 1 |
| GL-RG: Global-Local Representation Granularity for Video Captioning | May 22, 2022 | Caption GenerationDescriptive | CodeCode Available | 1 |
| Spatiality-guided Transformer for 3D Dense Captioning on Point Clouds | Apr 22, 2022 | 3D dense captioning3D Object Detection | CodeCode Available | 1 |
| Injecting Semantic Concepts into End-to-End Image Captioning | Dec 9, 2021 | Caption GenerationImage Captioning | CodeCode Available | 1 |
| Controllable Video Captioning with an Exemplar Sentence | Dec 2, 2021 | Caption GenerationDecoder | CodeCode Available | 1 |
| SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning | Nov 25, 2021 | Caption GenerationQuestion Answering | CodeCode Available | 1 |
| Topic Scene Graph Generation by Attention Distillation from Caption | Oct 12, 2021 | Caption GenerationGraph Generation | CodeCode Available | 1 |
| COSMic: A Coherence-Aware Generation Metric for Image Descriptions | Sep 11, 2021 | Caption GenerationImage Captioning | CodeCode Available | 1 |
| End-to-End Dense Video Captioning with Parallel Decoding | Aug 17, 2021 | Caption GenerationDense Video Captioning | CodeCode Available | 1 |
| Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization | Jun 11, 2021 | Caption GenerationObject | CodeCode Available | 1 |
| Connecting What to Say With Where to Look by Modeling Human Attention Traces | May 12, 2021 | Caption GenerationImage Captioning | CodeCode Available | 1 |
| Towards Accurate Text-based Image Captioning with Content Diversity Exploration | Apr 23, 2021 | Caption GenerationDiversity | CodeCode Available | 1 |
| Human-like Controllable Image Captioning with Verb-specific Semantic Roles | Mar 22, 2021 | Caption Generationcontrollable image captioning | CodeCode Available | 1 |
| Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts | Feb 17, 2021 | Caption GenerationDiversity | CodeCode Available | 1 |