| D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding | Dec 2, 2021 | 3D dense captioning3D visual grounding | —Unverified | 0 |
| Image Caption Generation Framework for Assamese News using Attention Mechanism | Dec 1, 2021 | Caption GenerationDecoder | —Unverified | 0 |
| Multi-modal Dependency Tree for Video Captioning | Dec 1, 2021 | Caption GenerationDependency Parsing | —Unverified | 0 |
| CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter | Nov 30, 2021 | Caption GenerationRepresentation Learning | CodeCode Available | 0 |
| E-MMAD: Multimodal Advertising Caption Generation Based on Structured Information | Nov 16, 2021 | Caption Generationvalid | —Unverified | 0 |
| Temporal Knowledge-Aware Image Captioning | Nov 16, 2021 | Caption GenerationImage Captioning | —Unverified | 0 |
| AUTOMATED AUDIO CAPTIONING BY FINE-TUNING BART WITH AUDIOSET TAGS | Nov 15, 2021 | AudioCapsAudio captioning | CodeCode Available | 0 |
| Rˆ3Net:Relation-embedded Representation Reconstruction Network for Change Captioning | Nov 1, 2021 | Caption GenerationRelation | CodeCode Available | 0 |
| Bangla Image Caption Generation through CNN-Transformer based Encoder-Decoder Network | Oct 24, 2021 | Caption GenerationDecoder | CodeCode Available | 0 |
| Cortico-cerebellar networks as decoupling neural interfaces | Oct 21, 2021 | Caption Generation | CodeCode Available | 0 |