| Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning | Feb 8, 2023 | Caption GenerationDecoder | —Unverified | 0 |
| Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning | Feb 4, 2023 | Caption GenerationCoherence Evaluation | CodeCode Available | 0 |
| Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation | Jan 2, 2023 | Caption GenerationInstance Segmentation | CodeCode Available | 1 |
| Uncertainty-Aware Image Captioning | Nov 30, 2022 | Caption GenerationImage Captioning | —Unverified | 0 |
| Retrieval-Augmented Multimodal Language Modeling | Nov 22, 2022 | Caption GenerationImage Captioning | —Unverified | 0 |
| Visual Commonsense-aware Representation Network for Video Captioning | Nov 17, 2022 | Caption GenerationQuestion Answering | CodeCode Available | 1 |
| Event and Entity Extraction from Generated Video Captions | Nov 5, 2022 | Caption GenerationDense Video Captioning | CodeCode Available | 0 |
| Image Caption Generation for Low-Resource Assamese Language | Nov 1, 2022 | Caption GenerationDecoder | —Unverified | 0 |
| EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning | Oct 14, 2022 | Caption GenerationKnowledge Distillation | CodeCode Available | 1 |
| Generating image captions with external encyclopedic knowledge | Oct 10, 2022 | Caption GenerationImage Captioning | —Unverified | 0 |