SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 401450 of 473 papers

TitleStatusHype
Incorporating Background Knowledge into Video Description Generation0
Temporally Grounding Natural Sentence in Video0
A Dataset for Telling the Stories of Social Media Videos0
Vector Learning for Cross Domain Representations0
MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie DescriptionCode0
Bridge Video and Text with Cascade Syntactic Structure0
Move Forward and Tell: A Progressive Generator of Video Descriptions0
NMT-Keras: a Very Flexible Toolkit with a Focus on Interactive NMT and Online LearningCode0
Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction0
Deep Reinforcement Learning for NLP0
Best Vision Technologies Submission to ActivityNet Challenge 2018-Task: Dense-Captioning Events in Videos0
RUC+CMU: System Report for Dense Captioning Events in Videos0
Fine-Grained Video Captioning for Sports Narrative0
M3: Multimodal Memory Modelling for Video Captioning0
Interpretable Video Captioning via Trajectory Structured Localization0
Amortized Context Vector Inference for Sequence-to-Sequence Networks0
ECO: Efficient Convolutional Network for Online Video UnderstandingCode0
Jointly Localizing and Describing Events for Dense Video Captioning0
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video CaptioningCode0
End-to-End Dense Video Captioning with Masked TransformerCode0
Bidirectional Attentive Fusion with Context Gating for Dense Video CaptioningCode0
Reconstruction Network for Video CaptioningCode0
End-to-End Video Captioning with Multitask Reinforcement LearningCode0
Less Is More: Picking Informative Frames for Video Captioning0
Joint Event Detection and Description in Continuous Video StreamsCode0
Consensus-based Sequence Training for Video Captioning0
Video Captioning via Hierarchical Reinforcement Learning0
Excitation Backprop for RNNsCode0
Grounded Objects and Interactions for Video Captioning0
Attend and Interact: Higher-Order Object Interactions for Video Understanding0
Procedural Text Generation from an Execution Video0
Evaluation of Automatic Video Captioning Using Direct Assessment0
Video Captioning with Guidance of Multimodal Latent Topics0
Generating Video Descriptions with Topic Guidance0
From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning0
Reinforced Video Captioning with Entailment Rewards0
Supervising Neural Attention Models for Video Captioning by Human Gaze Data0
Task-Driven Dynamic Fusion: Reducing Ambiguity in Video Description0
Multimodal Machine Learning: Integrating Language, Vision and Speech0
Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning0
Multi-Task Video Captioning with Video and Entailment Generation0
Weakly Supervised Dense Video Captioning0
Towards Automatic Learning of Procedures from Web Instructional VideosCode0
Improving Interpretability of Deep Neural Networks with Semantic Information0
Temporal Tessellation: A Unified Approach for Video AnalysisCode0
Top-down Visual Saliency Guided by CaptionsCode0
Video Captioning with Multi-Faceted Attention0
Bidirectional Multirate Reconstruction for Temporal Modeling in Videos0
Hierarchical Boundary-Aware Neural Encoder for Video Captioning0
Video Captioning with Transferred Semantic Attributes0
Show:102550
← PrevPage 9 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1mPLUG-2CIDEr80Unverified
2VASTCIDEr78Unverified
3GIT2CIDEr75.9Unverified
4VLABCIDEr74.9Unverified
5COSACIDEr74.7Unverified
6VALORCIDEr74Unverified
7MaMMUT (ours)CIDEr73.6Unverified
8VideoCoCaCIDEr73.2Unverified
9RTQCIDEr69.3Unverified
10HowToCaptionCIDEr65.3Unverified
#ModelMetricClaimedVerifiedStatus
1MaMMUTCIDEr195.6Unverified
2VLABCIDEr179.8Unverified
3COSACIDEr178.5Unverified
4VALORCIDEr178.5Unverified
5mPLUG-2CIDEr165.8Unverified
6HowToCaptionCIDEr154.2Unverified
7HiTeACIDEr146.9Unverified
8Vid2SeqCIDEr146.2Unverified
9VIOLETv2CIDEr139.2Unverified
10RTQCIDEr123.4Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-418.2Unverified
2UniVL + MELTRBLEU-417.92Unverified
3UniVLBLEU-417.35Unverified
4VideoCoCaBLEU-414.2Unverified
5VLMBLEU-412.27Unverified
6E2vidD6-MASSvid-BiDBLEU-412.04Unverified
7TextKGBLEU-411.7Unverified
8COOTBLEU-411.3Unverified
9COSABLEU-410.1Unverified
10HowToCaptionBLEU-48.8Unverified
#ModelMetricClaimedVerifiedStatus
1VALORBLEU-445.6Unverified
2VASTBLEU-445Unverified
3COSABLEU-443.7Unverified
4VideoCoCaBLEU-439.7Unverified
5IcoCap (ViT-B/16)BLEU-437.4Unverified
6IcoCap (ViT-B/32)BLEU-436.9Unverified
7VASTA (Kinetics-backbone)BLEU-436.25Unverified
8CoCap (ViT/L14)BLEU-435.8Unverified
9ORG-TRLBLEU-432.1Unverified
10NITS-VCBLEU-420Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCoCaBLEU414.7Unverified
2VLTinT (ae-test split) C3D/LingBLEU414.5Unverified
3VLCap (ae-test split) - Appearance + LanguageBLEU413.38Unverified
4COOT (ae-test split) - Only Appearance featuresBLEU410.85Unverified
5MART (ae-test split) - Appearance + FlowBLEU410.33Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr49.87Unverified
2GITCIDEr32.43Unverified
3SEM-POSCIDEr26.01Unverified
4AKGNNCIDEr25.9Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr63.51Unverified
2GITCIDEr45.63Unverified
3SEM-POSCIDEr37.16Unverified
4AKGNNCIDEr35.08Unverified
#ModelMetricClaimedVerifiedStatus
1SBD_KeyframeBLEU441.01Unverified
2V+S-Att-basedBLEU436.2Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-419.9Unverified
2COSABLEU-418.8Unverified
#ModelMetricClaimedVerifiedStatus
1GVTBLEU417.7Unverified
#ModelMetricClaimedVerifiedStatus
1VNS-GRU (Cross-Lingual)BLEU-458.68Unverified
#ModelMetricClaimedVerifiedStatus
1Shot2StoryCIDEr37.4Unverified
#ModelMetricClaimedVerifiedStatus
1Vid2SeqCIDEr120.5Unverified