SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 251300 of 473 papers

TitleStatusHype
Evaluation of Automatic Video Captioning Using Direct Assessment0
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features0
Event-Equalized Dense Video Captioning0
EVLM: An Efficient Vision-Language Model for Visual Understanding0
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos0
Exploiting long-term temporal dynamics for video captioning0
Exploration of Visual Features and their weighted-additive fusion for Video Captioning0
Exploring Group Video Captioning with Efficient Relational Approximation0
Exploring Temporal Event Cues for Dense Video Captioning in Cyclic Co-learning0
Exploring the Role of Audio in Video Captioning0
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks0
Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework0
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity0
Fine-grained length controllable video captioning with ordinal embeddings0
Fine-Grained Video Captioning for Sports Narrative0
Fine-Grained Video Captioning through Scene Graph Consolidation0
CaReBench: A Fine-Grained Benchmark for Video Captioning and Retrieval0
From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning0
Exploiting Auxiliary Caption for Video Grounding0
Generating Video Descriptions with Topic Guidance0
Generative Adversarial Network Applications in Creating a Meta-Universe0
Get In Video: Add Anything You Want to the Video0
Global2Local: A Joint-Hierarchical Attention for Video Captioning0
Graph Similarities and Dual Approach for Sequential Text-to-Image Retrieval0
Grounded Objects and Interactions for Video Captioning0
GUI Action Narrator: Where and When Did That Action Take Place?0
Guidance Module Network for Video Captioning0
Guiding the Flowing of Semantics: Interpretable Video Captioning via POS Tag0
Hierarchical Boundary-Aware Neural Encoder for Video Captioning0
Hierarchical LSTMs with Adaptive Attention for Visual Captioning0
Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning0
Hierarchical memory decoder for visual narrating0
Hierarchical Memory Decoding for Video Captioning0
Hierarchical Multimodal Transformer to Summarize Videos0
Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning0
Hierarchical Recurrent Neural Network for Video Summarization0
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training0
HiVLP: Hierarchical Interactive Video-Language Pre-Training0
Human Action Sequence Classification0
Human-centric Behavior Description in Videos: New Benchmark and Model0
HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation0
IcoCap: Improving Video Captioning by Compounding Images0
Image-to-Video Person Re-Identification by Reusing Cross-modal Embeddings0
Imperial College London Submission to VATEX Video Captioning Task0
Implicit and Explicit Commonsense for Multi-sentence Video Captioning0
Improving Interpretability of Deep Neural Networks with Semantic Information0
Incorporating Background Knowledge into Video Description Generation0
Incorporating granularity bias as the margin into contrastive loss for video captioning0
In-Home Daily-Life Captioning Using Radio Signals0
Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 20190
Show:102550
← PrevPage 6 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1mPLUG-2CIDEr80Unverified
2VASTCIDEr78Unverified
3GIT2CIDEr75.9Unverified
4VLABCIDEr74.9Unverified
5COSACIDEr74.7Unverified
6VALORCIDEr74Unverified
7MaMMUT (ours)CIDEr73.6Unverified
8VideoCoCaCIDEr73.2Unverified
9RTQCIDEr69.3Unverified
10HowToCaptionCIDEr65.3Unverified
#ModelMetricClaimedVerifiedStatus
1MaMMUTCIDEr195.6Unverified
2VLABCIDEr179.8Unverified
3COSACIDEr178.5Unverified
4VALORCIDEr178.5Unverified
5mPLUG-2CIDEr165.8Unverified
6HowToCaptionCIDEr154.2Unverified
7HiTeACIDEr146.9Unverified
8Vid2SeqCIDEr146.2Unverified
9VIOLETv2CIDEr139.2Unverified
10RTQCIDEr123.4Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-418.2Unverified
2UniVL + MELTRBLEU-417.92Unverified
3UniVLBLEU-417.35Unverified
4VideoCoCaBLEU-414.2Unverified
5VLMBLEU-412.27Unverified
6E2vidD6-MASSvid-BiDBLEU-412.04Unverified
7TextKGBLEU-411.7Unverified
8COOTBLEU-411.3Unverified
9COSABLEU-410.1Unverified
10HowToCaptionBLEU-48.8Unverified
#ModelMetricClaimedVerifiedStatus
1VALORBLEU-445.6Unverified
2VASTBLEU-445Unverified
3COSABLEU-443.7Unverified
4VideoCoCaBLEU-439.7Unverified
5IcoCap (ViT-B/16)BLEU-437.4Unverified
6IcoCap (ViT-B/32)BLEU-436.9Unverified
7VASTA (Kinetics-backbone)BLEU-436.25Unverified
8CoCap (ViT/L14)BLEU-435.8Unverified
9ORG-TRLBLEU-432.1Unverified
10NITS-VCBLEU-420Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCoCaBLEU414.7Unverified
2VLTinT (ae-test split) C3D/LingBLEU414.5Unverified
3VLCap (ae-test split) - Appearance + LanguageBLEU413.38Unverified
4COOT (ae-test split) - Only Appearance featuresBLEU410.85Unverified
5MART (ae-test split) - Appearance + FlowBLEU410.33Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr49.87Unverified
2GITCIDEr32.43Unverified
3SEM-POSCIDEr26.01Unverified
4AKGNNCIDEr25.9Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr63.51Unverified
2GITCIDEr45.63Unverified
3SEM-POSCIDEr37.16Unverified
4AKGNNCIDEr35.08Unverified
#ModelMetricClaimedVerifiedStatus
1SBD_KeyframeBLEU441.01Unverified
2V+S-Att-basedBLEU436.2Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-419.9Unverified
2COSABLEU-418.8Unverified
#ModelMetricClaimedVerifiedStatus
1GVTBLEU417.7Unverified
#ModelMetricClaimedVerifiedStatus
1VNS-GRU (Cross-Lingual)BLEU-458.68Unverified
#ModelMetricClaimedVerifiedStatus
1Shot2StoryCIDEr37.4Unverified
#ModelMetricClaimedVerifiedStatus
1Vid2SeqCIDEr120.5Unverified