| D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding | Dec 2, 2021 | 3D dense captioning3D visual grounding | —Unverified | 0 |
| Controllable Video Captioning with an Exemplar Sentence | Dec 2, 2021 | Caption GenerationDecoder | CodeCode Available | 1 |
| Image Caption Generation Framework for Assamese News using Attention Mechanism | Dec 1, 2021 | Caption GenerationDecoder | —Unverified | 0 |
| Multi-modal Dependency Tree for Video Captioning | Dec 1, 2021 | Caption GenerationDependency Parsing | —Unverified | 0 |
| CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter | Nov 30, 2021 | Caption GenerationRepresentation Learning | CodeCode Available | 0 |
| SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning | Nov 25, 2021 | Caption GenerationQuestion Answering | CodeCode Available | 1 |
| E-MMAD: Multimodal Advertising Caption Generation Based on Structured Information | Nov 16, 2021 | Caption Generationvalid | —Unverified | 0 |
| Temporal Knowledge-Aware Image Captioning | Nov 16, 2021 | Caption GenerationImage Captioning | —Unverified | 0 |
| AUTOMATED AUDIO CAPTIONING BY FINE-TUNING BART WITH AUDIOSET TAGS | Nov 15, 2021 | AudioCapsAudio captioning | CodeCode Available | 0 |
| Rˆ3Net:Relation-embedded Representation Reconstruction Network for Change Captioning | Nov 1, 2021 | Caption GenerationRelation | CodeCode Available | 0 |