SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 201250 of 473 papers

TitleStatusHype
Syntax Customized Video Captioning by Imitating Exemplar SentencesCode0
Temporal Deformable Convolutional Encoder-Decoder Networks for Video CaptioningCode0
Temporal Tessellation: A Unified Approach for Video AnalysisCode0
Top-down Visual Saliency Guided by CaptionsCode0
Towards Automatic Learning of Procedures from Web Instructional VideosCode0
Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled VideosCode0
Translating Videos to Natural Language Using Deep Recurrent Neural NetworksCode0
VideoBERT: A Joint Model for Video and Language Representation LearningCode0
Video Description using Bidirectional Recurrent Neural NetworksCode0
Video Summarization: Towards Entity-Aware CaptionsCode0
Visual Transformation TellingCode0
VLM: Task-agnostic Video-Language Model Pre-training for Video UnderstandingCode0
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video CaptioningCode0
Knowledge Guided Entity-aware Video Captioning and A Basketball Benchmark0
Fine-Grained Video Captioning for Sports Narrative0
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision0
Fine-grained length controllable video captioning with ordinal embeddings0
Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding0
Learning Actions from Human Demonstration Video for Robotic Manipulation0
Learning Audio-Video Modalities from Image Captions0
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity0
Learning Interactive Real-World Simulators0
Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework0
Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning0
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks0
Exploring the Role of Audio in Video Captioning0
Less Is More: Picking Informative Frames for Video Captioning0
Video Captioning with Text-based Dynamic Attention and Step-by-Step Learning0
Exploring Temporal Event Cues for Dense Video Captioning in Cyclic Co-learning0
LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models0
Video Captioning with Transferred Semantic Attributes0
Low-Rank HOCA: Efficient High-Order Cross-Modal Attention for Video Captioning0
Exploring Group Video Captioning with Efficient Relational Approximation0
Exploration of Visual Features and their weighted-additive fusion for Video Captioning0
M3: Multimodal Memory Modelling for Video Captioning0
Exploiting long-term temporal dynamics for video captioning0
Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers0
MAMS: Model-Agnostic Module Selection Framework for Video Captioning0
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos0
MAViC: Multimodal Active Learning for Video Captioning0
MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning for Multimodal Video Captioning0
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction0
EVLM: An Efficient Vision-Language Model for Visual Understanding0
Attention based video captioning framework for Hindi0
Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023)0
Attend and Interact: Higher-Order Object Interactions for Video Understanding0
Middle-Out Decoding0
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models0
Modality Alignment between Deep Representations for Effective Video-and-Language Learning0
Models See Hallucinations: Evaluating the Factuality in Video Captioning0
Show:102550
← PrevPage 5 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1mPLUG-2CIDEr80Unverified
2VASTCIDEr78Unverified
3GIT2CIDEr75.9Unverified
4VLABCIDEr74.9Unverified
5COSACIDEr74.7Unverified
6VALORCIDEr74Unverified
7MaMMUT (ours)CIDEr73.6Unverified
8VideoCoCaCIDEr73.2Unverified
9RTQCIDEr69.3Unverified
10HowToCaptionCIDEr65.3Unverified
#ModelMetricClaimedVerifiedStatus
1MaMMUTCIDEr195.6Unverified
2VLABCIDEr179.8Unverified
3COSACIDEr178.5Unverified
4VALORCIDEr178.5Unverified
5mPLUG-2CIDEr165.8Unverified
6HowToCaptionCIDEr154.2Unverified
7HiTeACIDEr146.9Unverified
8Vid2SeqCIDEr146.2Unverified
9VIOLETv2CIDEr139.2Unverified
10RTQCIDEr123.4Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-418.2Unverified
2UniVL + MELTRBLEU-417.92Unverified
3UniVLBLEU-417.35Unverified
4VideoCoCaBLEU-414.2Unverified
5VLMBLEU-412.27Unverified
6E2vidD6-MASSvid-BiDBLEU-412.04Unverified
7TextKGBLEU-411.7Unverified
8COOTBLEU-411.3Unverified
9COSABLEU-410.1Unverified
10HowToCaptionBLEU-48.8Unverified
#ModelMetricClaimedVerifiedStatus
1VALORBLEU-445.6Unverified
2VASTBLEU-445Unverified
3COSABLEU-443.7Unverified
4VideoCoCaBLEU-439.7Unverified
5IcoCap (ViT-B/16)BLEU-437.4Unverified
6IcoCap (ViT-B/32)BLEU-436.9Unverified
7VASTA (Kinetics-backbone)BLEU-436.25Unverified
8CoCap (ViT/L14)BLEU-435.8Unverified
9ORG-TRLBLEU-432.1Unverified
10NITS-VCBLEU-420Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCoCaBLEU414.7Unverified
2VLTinT (ae-test split) C3D/LingBLEU414.5Unverified
3VLCap (ae-test split) - Appearance + LanguageBLEU413.38Unverified
4COOT (ae-test split) - Only Appearance featuresBLEU410.85Unverified
5MART (ae-test split) - Appearance + FlowBLEU410.33Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr49.87Unverified
2GITCIDEr32.43Unverified
3SEM-POSCIDEr26.01Unverified
4AKGNNCIDEr25.9Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr63.51Unverified
2GITCIDEr45.63Unverified
3SEM-POSCIDEr37.16Unverified
4AKGNNCIDEr35.08Unverified
#ModelMetricClaimedVerifiedStatus
1SBD_KeyframeBLEU441.01Unverified
2V+S-Att-basedBLEU436.2Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-419.9Unverified
2COSABLEU-418.8Unverified
#ModelMetricClaimedVerifiedStatus
1GVTBLEU417.7Unverified
#ModelMetricClaimedVerifiedStatus
1VNS-GRU (Cross-Lingual)BLEU-458.68Unverified
#ModelMetricClaimedVerifiedStatus
1Shot2StoryCIDEr37.4Unverified
#ModelMetricClaimedVerifiedStatus
1Vid2SeqCIDEr120.5Unverified