SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 251300 of 473 papers

TitleStatusHype
MAViC: Multimodal Active Learning for Video Captioning0
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners0
Refined Semantic Enhancement towards Frequency Diffusion for Video CaptioningCode0
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning0
Event and Entity Extraction from Generated Video CaptionsCode0
Fighting FIRe with FIRE: Assessing the Validity of Text-to-Video Retrieval Benchmarks0
Recipe Generation from Unsegmented Cooking Videos0
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks0
StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story ContinuationCode0
Diverse Video Captioning by Adaptive Spatio-temporal AttentionCode0
Boosting Video-Text Retrieval with Explicit High-Level Semantics0
SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning with Human Object Interactions0
Dual-Stream Transformer for Generic Event Boundary CaptioningCode0
PIC 4th Challenge: Semantic-Assisted Multi-Feature Encoding and Multi-Head Decoding for Dense Video Captioning0
Modality Alignment between Deep Representations for Effective Video-and-Language Learning0
Support-set based Multi-modal Representation Enhancement for Video CaptioningCode0
Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information0
Dual-Level Decoupled Transformer for Video Captioning0
Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled VideosCode0
End-to-end Dense Video Captioning as Sequence Generation0
Semantic-Aware Pretraining for Dense Video Captioning0
Video Captioning: a comparative review of where we are and which could be the route0
Learning Audio-Video Modalities from Image Captions0
CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation0
Global2Local: A Joint-Hierarchical Attention for Video Captioning0
Exploiting long-term temporal dynamics for video captioning0
BERTHA: Video Captioning Evaluation Via Transfer-Learned Human AssessmentCode0
Generative Adversarial Network Applications in Creating a Meta-Universe0
An Integrated Approach for Video Captioning and Applications0
End-to-end Generative Pretraining for Multimodal Video Captioning0
Discourse Analysis for Evaluating Coherence in Video Paragraph Captions0
End-to-end Dense Video Captioning as Sequence Generation0
Boosting Video Representation Learning with Multi-Faceted Integration0
Variational Stacked Local Attention Networks for Diverse Video Captioning0
Dense Video Captioning Using Unsupervised Semantic InformationCode0
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising0
Syntax Customized Video Captioning by Imitating Exemplar SentencesCode0
Multi-modal Dependency Tree for Video Captioning0
An Efficient Keyframes Selection Based Framework for Video Captioning0
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does MatterCode0
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning0
Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework0
CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation0
E-MMAD: Multimodal Advertising Caption Generation Based on Structured Information0
Visual-aware Attention Dual-stream Decoder for Video Captioning0
CLIP4Caption: CLIP for Video Caption0
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations0
Graph Similarities and Dual Approach for Sequential Text-to-Image Retrieval0
OSVidCap: A Framework for the Simultaneous Recognition and Description of Concurrent Actions in Videos in an Open-Set ScenarioCode0
Hierarchical Multimodal Transformer to Summarize Videos0
Show:102550
← PrevPage 6 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1mPLUG-2CIDEr80Unverified
2VASTCIDEr78Unverified
3GIT2CIDEr75.9Unverified
4VLABCIDEr74.9Unverified
5COSACIDEr74.7Unverified
6VALORCIDEr74Unverified
7MaMMUT (ours)CIDEr73.6Unverified
8VideoCoCaCIDEr73.2Unverified
9RTQCIDEr69.3Unverified
10HowToCaptionCIDEr65.3Unverified
#ModelMetricClaimedVerifiedStatus
1MaMMUTCIDEr195.6Unverified
2VLABCIDEr179.8Unverified
3COSACIDEr178.5Unverified
4VALORCIDEr178.5Unverified
5mPLUG-2CIDEr165.8Unverified
6HowToCaptionCIDEr154.2Unverified
7HiTeACIDEr146.9Unverified
8Vid2SeqCIDEr146.2Unverified
9VIOLETv2CIDEr139.2Unverified
10RTQCIDEr123.4Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-418.2Unverified
2UniVL + MELTRBLEU-417.92Unverified
3UniVLBLEU-417.35Unverified
4VideoCoCaBLEU-414.2Unverified
5VLMBLEU-412.27Unverified
6E2vidD6-MASSvid-BiDBLEU-412.04Unverified
7TextKGBLEU-411.7Unverified
8COOTBLEU-411.3Unverified
9COSABLEU-410.1Unverified
10HowToCaptionBLEU-48.8Unverified
#ModelMetricClaimedVerifiedStatus
1VALORBLEU-445.6Unverified
2VASTBLEU-445Unverified
3COSABLEU-443.7Unverified
4VideoCoCaBLEU-439.7Unverified
5IcoCap (ViT-B/16)BLEU-437.4Unverified
6IcoCap (ViT-B/32)BLEU-436.9Unverified
7VASTA (Kinetics-backbone)BLEU-436.25Unverified
8CoCap (ViT/L14)BLEU-435.8Unverified
9ORG-TRLBLEU-432.1Unverified
10NITS-VCBLEU-420Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCoCaBLEU414.7Unverified
2VLTinT (ae-test split) C3D/LingBLEU414.5Unverified
3VLCap (ae-test split) - Appearance + LanguageBLEU413.38Unverified
4COOT (ae-test split) - Only Appearance featuresBLEU410.85Unverified
5MART (ae-test split) - Appearance + FlowBLEU410.33Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr49.87Unverified
2GITCIDEr32.43Unverified
3SEM-POSCIDEr26.01Unverified
4AKGNNCIDEr25.9Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr63.51Unverified
2GITCIDEr45.63Unverified
3SEM-POSCIDEr37.16Unverified
4AKGNNCIDEr35.08Unverified
#ModelMetricClaimedVerifiedStatus
1SBD_KeyframeBLEU441.01Unverified
2V+S-Att-basedBLEU436.2Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-419.9Unverified
2COSABLEU-418.8Unverified
#ModelMetricClaimedVerifiedStatus
1GVTBLEU417.7Unverified
#ModelMetricClaimedVerifiedStatus
1VNS-GRU (Cross-Lingual)BLEU-458.68Unverified
#ModelMetricClaimedVerifiedStatus
1Shot2StoryCIDEr37.4Unverified
#ModelMetricClaimedVerifiedStatus
1Vid2SeqCIDEr120.5Unverified