| Transferable Decoding with Visual Entities for Zero-Shot Image Captioning | Jul 31, 2023 | Caption GenerationHallucination | CodeCode Available | 1 |
| Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension | Oct 18, 2024 | Caption Generation | CodeCode Available | 1 |
| Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation | Jan 2, 2023 | Caption GenerationInstance Segmentation | CodeCode Available | 1 |
| Video captioning with recurrent networks based on frame- and video-level features and visual content classification | Dec 9, 2015 | Caption GenerationGeneral Classification | CodeCode Available | 1 |
| Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation | Aug 17, 2016 | Caption GenerationDecoder | CodeCode Available | 1 |
| HCQA @ Ego4D EgoSchema Challenge 2024 | Jun 22, 2024 | Caption Generation | CodeCode Available | 1 |
| Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning | Jul 16, 2024 | Caption Generationcross-modal alignment | CodeCode Available | 1 |
| Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs | Mar 1, 2020 | AttributeCaption Generation | CodeCode Available | 1 |
| TAP: Text-Aware Pre-training for Text-VQA and Text-Caption | Dec 8, 2020 | Caption GenerationLanguage Modeling | CodeCode Available | 1 |
| Deep Reinforcement Learning For Sequence to Sequence Models | May 24, 2018 | Abstractive Text SummarizationCaption Generation | CodeCode Available | 1 |
| Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models | Sep 17, 2021 | Caption GenerationDenoising | —Unverified | 0 |
| Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models | Nov 18, 2019 | Anomaly DetectionAutonomous Driving | —Unverified | 0 |
| End-to-End Video Captioning | Apr 4, 2019 | Action RecognitionCaption Generation | —Unverified | 0 |
| Deep Learning Approaches on Image Captioning: A Review | Jan 31, 2022 | Caption GenerationDeep Learning | —Unverified | 0 |
| VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools | Oct 16, 2023 | Caption GenerationDescriptive | —Unverified | 0 |
| Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation | Jan 18, 2024 | Caption GenerationLanguage Modeling | —Unverified | 0 |
| Bi-directional Contextual Attention for 3D Dense Captioning | Aug 13, 2024 | 3D dense captioningAttribute | —Unverified | 0 |
| Deep Bayesian Natural Language Processing | Jul 1, 2019 | Caption GenerationClustering | —Unverified | 0 |
| An encoder-decoder based framework for hindi image caption generation | Jul 9, 2021 | Caption GenerationDecoder | —Unverified | 0 |
| DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism | Nov 25, 2023 | Caption GenerationDenoising | —Unverified | 0 |
| D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding | Dec 2, 2021 | 3D dense captioning3D visual grounding | —Unverified | 0 |
| BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving | Jan 2, 2024 | Autonomous DrivingCaption Generation | —Unverified | 0 |
| Error Causal inference for Multi-Fusion models | Jun 1, 2021 | Caption GenerationCausal Inference | —Unverified | 0 |
| Evaluation of Automatic Video Captioning Using Direct Assessment | Oct 29, 2017 | Caption GenerationMachine Translation | —Unverified | 0 |
| Cross-modal Coherence Modeling for Caption Generation | Jul 1, 2020 | Caption Generationcontrollable image captioning | —Unverified | 0 |