SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 351400 of 473 papers

TitleStatusHype
Temporal Perceiving Video-Language Pre-training0
Text with Knowledge Graph Augmented Transformer for Video Captioning0
The 8th AI City Challenge0
The Devil is in the Distributions: Explicit Modeling of Scene Content is Key in Zero-Shot Video Captioning0
The Use of Video Captioning for Fostering Physical Activity0
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation0
Title Generation for User Generated Videos0
Towards Bridging Event Captioner and Sentence Localizer for Weakly Supervised Dense Event Captioning0
Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset0
Transformer in action: a comparative study of transformer-based acoustic models for large scale speech recognition applications0
TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval0
Towards Surveillance Video-and-Language Understanding: New Dataset, Baselines, and Challenges0
Understanding Action Sequences based on Video Captioning for Learning-from-Observation0
Variational Stacked Local Attention Networks for Diverse Video Captioning0
VATEX Captioning Challenge 2019: Multi-modal Information Fusion and Multi-stage Training Strategy for Video Captioning0
Vector Learning for Cross Domain Representations0
VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks0
ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation0
Video Captioning: a comparative review of where we are and which could be the route0
Video Captioning in Compressed Video0
Video Captioning Using Weak Annotation0
Video Captioning via Hierarchical Reinforcement Learning0
Video Captioning with Aggregated Features Based on Dual Graphs and Gated Fusion0
Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction0
Video Captioning with Guidance of Multimodal Latent Topics0
Video Captioning with Multi-Faceted Attention0
Video Captioning with Text-based Dynamic Attention and Step-by-Step Learning0
Video Captioning with Transferred Semantic Attributes0
Video LLMs for Temporal Reasoning in Long Videos0
VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation0
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks0
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners0
Vision and Language: from Visual Perception to Content Creation0
Visual-aware Attention Dual-stream Decoder for Video Captioning0
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending0
Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding0
Watch It Twice: Video Captioning with a Refocused Video Encoder0
Weakly Supervised Dense Video Captioning0
Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching0
Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning0
Wolf: Captioning Everything with a World Summarization Framework0
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment0
Dual-path Collaborative Generation Network for Emotional Video CaptioningCode0
Reconstruction Network for Video CaptioningCode0
Live Video CaptioningCode0
Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled VideosCode0
Refined Semantic Enhancement towards Frequency Diffusion for Video CaptioningCode0
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video CaptioningCode0
Translating Videos to Natural Language Using Deep Recurrent Neural NetworksCode0
Cross-Modal Graph with Meta Concepts for Video CaptioningCode0
Show:102550
← PrevPage 8 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1mPLUG-2CIDEr80Unverified
2VASTCIDEr78Unverified
3GIT2CIDEr75.9Unverified
4VLABCIDEr74.9Unverified
5COSACIDEr74.7Unverified
6VALORCIDEr74Unverified
7MaMMUT (ours)CIDEr73.6Unverified
8VideoCoCaCIDEr73.2Unverified
9RTQCIDEr69.3Unverified
10HowToCaptionCIDEr65.3Unverified
#ModelMetricClaimedVerifiedStatus
1MaMMUTCIDEr195.6Unverified
2VLABCIDEr179.8Unverified
3COSACIDEr178.5Unverified
4VALORCIDEr178.5Unverified
5mPLUG-2CIDEr165.8Unverified
6HowToCaptionCIDEr154.2Unverified
7HiTeACIDEr146.9Unverified
8Vid2SeqCIDEr146.2Unverified
9VIOLETv2CIDEr139.2Unverified
10RTQCIDEr123.4Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-418.2Unverified
2UniVL + MELTRBLEU-417.92Unverified
3UniVLBLEU-417.35Unverified
4VideoCoCaBLEU-414.2Unverified
5VLMBLEU-412.27Unverified
6E2vidD6-MASSvid-BiDBLEU-412.04Unverified
7TextKGBLEU-411.7Unverified
8COOTBLEU-411.3Unverified
9COSABLEU-410.1Unverified
10HowToCaptionBLEU-48.8Unverified
#ModelMetricClaimedVerifiedStatus
1VALORBLEU-445.6Unverified
2VASTBLEU-445Unverified
3COSABLEU-443.7Unverified
4VideoCoCaBLEU-439.7Unverified
5IcoCap (ViT-B/16)BLEU-437.4Unverified
6IcoCap (ViT-B/32)BLEU-436.9Unverified
7VASTA (Kinetics-backbone)BLEU-436.25Unverified
8CoCap (ViT/L14)BLEU-435.8Unverified
9ORG-TRLBLEU-432.1Unverified
10NITS-VCBLEU-420Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCoCaBLEU414.7Unverified
2VLTinT (ae-test split) C3D/LingBLEU414.5Unverified
3VLCap (ae-test split) - Appearance + LanguageBLEU413.38Unverified
4COOT (ae-test split) - Only Appearance featuresBLEU410.85Unverified
5MART (ae-test split) - Appearance + FlowBLEU410.33Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr49.87Unverified
2GITCIDEr32.43Unverified
3SEM-POSCIDEr26.01Unverified
4AKGNNCIDEr25.9Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr63.51Unverified
2GITCIDEr45.63Unverified
3SEM-POSCIDEr37.16Unverified
4AKGNNCIDEr35.08Unverified
#ModelMetricClaimedVerifiedStatus
1SBD_KeyframeBLEU441.01Unverified
2V+S-Att-basedBLEU436.2Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-419.9Unverified
2COSABLEU-418.8Unverified
#ModelMetricClaimedVerifiedStatus
1GVTBLEU417.7Unverified
#ModelMetricClaimedVerifiedStatus
1VNS-GRU (Cross-Lingual)BLEU-458.68Unverified
#ModelMetricClaimedVerifiedStatus
1Shot2StoryCIDEr37.4Unverified
#ModelMetricClaimedVerifiedStatus
1Vid2SeqCIDEr120.5Unverified