SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 351400 of 473 papers

TitleStatusHype
Crowd Video Captioning0
Video Captioning with Text-based Dynamic Attention and Step-by-Step Learning0
Low-Rank HOCA: Efficient High-Order Cross-Modal Attention for Video Captioning0
Guiding the Flowing of Semantics: Interpretable Video Captioning via POS Tag0
Diverse Video Captioning Through Latent Variable Expansion0
Vatex Video Captioning Challenge 2020: Multi-View Features and Hybrid Reward Strategies for Video Captioning0
Imperial College London Submission to VATEX Video Captioning Task0
Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 20190
VATEX Captioning Challenge 2019: Multi-modal Information Fusion and Multi-stage Training Strategy for Video Captioning0
SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability0
Human Action Sequence Classification0
Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning0
ContCap: A scalable framework for continual image captioningCode0
Learning Actions from Human Demonstration Video for Robotic Manipulation0
A Semantics-Assisted Video Captioning Model Trained with Scheduled SamplingCode0
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion NetworkCode0
Prediction and Description of Near-Future Activities in Video0
Watch It Twice: Video Captioning with a Refocused Video Encoder0
OmniNet: A unified architecture for multi-modal multi-task learningCode0
Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos0
Learning Video Representations using Contrastive Bidirectional Transformer0
Continual and Multi-Task Architecture SearchCode0
Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning0
Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers0
Relational Reasoning using Prior Knowledge for Visual Captioning0
Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning0
Learning to Generate Grounded Visual Captions without Localization SupervisionCode1
Interactive-predictive neural multimodal systems0
A Neural, Interactive-predictive System for Multimodal Sequence to Sequence TasksCode0
On Flow Profile Image for Video Representation0
Memory-Attended Recurrent Network for Video CaptioningCode0
Multimodal Semantic Attention Network for Video Captioning0
Temporal Deformable Convolutional Encoder-Decoder Networks for Video CaptioningCode0
Hierarchical Recurrent Neural Network for Video Summarization0
Large Scale Holistic Video UnderstandingCode1
Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?Code0
What and How Well You Performed? A Multitask Learning Approach to Action Quality AssessmentCode1
Streamlined Dense Video CaptioningCode0
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language ResearchCode1
End-to-End Video Captioning0
VideoBERT: A Joint Model for Video and Language Representation LearningCode0
M-VAD Names: a Dataset for Video Captioning with NamingCode0
Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning0
Not All Words are Equal: Video-specific Information Loss for Video Captioning0
Hierarchical LSTMs with Adaptive Attention for Visual Captioning0
An Attempt towards Interpretable Audio-Visual Video Captioning0
Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning0
Middle-Out Decoding0
Cross-Modal and Hierarchical Modeling of Video and TextCode0
Image-to-Video Person Re-Identification by Reusing Cross-modal Embeddings0
Show:102550
← PrevPage 8 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1mPLUG-2CIDEr80Unverified
2VASTCIDEr78Unverified
3GIT2CIDEr75.9Unverified
4VLABCIDEr74.9Unverified
5COSACIDEr74.7Unverified
6VALORCIDEr74Unverified
7MaMMUT (ours)CIDEr73.6Unverified
8VideoCoCaCIDEr73.2Unverified
9RTQCIDEr69.3Unverified
10HowToCaptionCIDEr65.3Unverified
#ModelMetricClaimedVerifiedStatus
1MaMMUTCIDEr195.6Unverified
2VLABCIDEr179.8Unverified
3COSACIDEr178.5Unverified
4VALORCIDEr178.5Unverified
5mPLUG-2CIDEr165.8Unverified
6HowToCaptionCIDEr154.2Unverified
7HiTeACIDEr146.9Unverified
8Vid2SeqCIDEr146.2Unverified
9VIOLETv2CIDEr139.2Unverified
10RTQCIDEr123.4Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-418.2Unverified
2UniVL + MELTRBLEU-417.92Unverified
3UniVLBLEU-417.35Unverified
4VideoCoCaBLEU-414.2Unverified
5VLMBLEU-412.27Unverified
6E2vidD6-MASSvid-BiDBLEU-412.04Unverified
7TextKGBLEU-411.7Unverified
8COOTBLEU-411.3Unverified
9COSABLEU-410.1Unverified
10HowToCaptionBLEU-48.8Unverified
#ModelMetricClaimedVerifiedStatus
1VALORBLEU-445.6Unverified
2VASTBLEU-445Unverified
3COSABLEU-443.7Unverified
4VideoCoCaBLEU-439.7Unverified
5IcoCap (ViT-B/16)BLEU-437.4Unverified
6IcoCap (ViT-B/32)BLEU-436.9Unverified
7VASTA (Kinetics-backbone)BLEU-436.25Unverified
8CoCap (ViT/L14)BLEU-435.8Unverified
9ORG-TRLBLEU-432.1Unverified
10NITS-VCBLEU-420Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCoCaBLEU414.7Unverified
2VLTinT (ae-test split) C3D/LingBLEU414.5Unverified
3VLCap (ae-test split) - Appearance + LanguageBLEU413.38Unverified
4COOT (ae-test split) - Only Appearance featuresBLEU410.85Unverified
5MART (ae-test split) - Appearance + FlowBLEU410.33Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr49.87Unverified
2GITCIDEr32.43Unverified
3SEM-POSCIDEr26.01Unverified
4AKGNNCIDEr25.9Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr63.51Unverified
2GITCIDEr45.63Unverified
3SEM-POSCIDEr37.16Unverified
4AKGNNCIDEr35.08Unverified
#ModelMetricClaimedVerifiedStatus
1SBD_KeyframeBLEU441.01Unverified
2V+S-Att-basedBLEU436.2Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-419.9Unverified
2COSABLEU-418.8Unverified
#ModelMetricClaimedVerifiedStatus
1GVTBLEU417.7Unverified
#ModelMetricClaimedVerifiedStatus
1VNS-GRU (Cross-Lingual)BLEU-458.68Unverified
#ModelMetricClaimedVerifiedStatus
1Shot2StoryCIDEr37.4Unverified
#ModelMetricClaimedVerifiedStatus
1Vid2SeqCIDEr120.5Unverified