SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 201250 of 473 papers

TitleStatusHype
Recipe Generation from Unsegmented Cooking Videos0
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks0
StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story ContinuationCode0
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual ModelingCode1
Partially Relevant Video RetrievalCode1
Diverse Video Captioning by Adaptive Spatio-temporal AttentionCode0
Boosting Video-Text Retrieval with Explicit High-Level Semantics0
SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning with Human Object Interactions0
Zero-Shot Video Captioning with Evolving Pseudo-TokensCode1
Unifying Event Detection and Captioning as Sequence Generation via Pre-TrainingCode1
Dual-Stream Transformer for Generic Event Boundary CaptioningCode0
PIC 4th Challenge: Semantic-Assisted Multi-Feature Encoding and Multi-Head Decoding for Dense Video Captioning0
Rethinking Surgical Captioning: End-to-End Window-Based MLP Transformer Using PatchesCode1
VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph CaptioningCode1
LAVENDER: Unifying Video-Language Understanding as Masked Language ModelingCode1
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEsCode2
Modality Alignment between Deep Representations for Effective Video-and-Language Learning0
GIT: A Generative Image-to-text Transformer for Vision and LanguageCode2
Language Models with Image Descriptors are Strong Few-Shot Video-Language LearnersCode1
GL-RG: Global-Local Representation Granularity for Video CaptioningCode1
Support-set based Multi-modal Representation Enhancement for Video CaptioningCode0
Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information0
Dual-Level Decoupled Transformer for Video Captioning0
Tragedy Plus Time: Capturing Unintended Human Activities from Weakly-labeled VideosCode0
End-to-end Dense Video Captioning as Sequence Generation0
MUGEN: A Playground for Video-Audio-Text Multimodal Understanding and GENerationCode1
Semantic-Aware Pretraining for Dense Video Captioning0
Video Captioning: a comparative review of where we are and which could be the route0
Learning Audio-Video Modalities from Image Captions0
CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation0
Global2Local: A Joint-Hierarchical Attention for Video Captioning0
Exploiting long-term temporal dynamics for video captioning0
BERTHA: Video Captioning Evaluation Via Transfer-Learned Human AssessmentCode0
An Integrated Approach for Video Captioning and Applications0
Generative Adversarial Network Applications in Creating a Meta-Universe0
End-to-end Generative Pretraining for Multimodal Video Captioning0
Discourse Analysis for Evaluating Coherence in Video Paragraph Captions0
End-to-end Dense Video Captioning as Sequence Generation0
Boosting Video Representation Learning with Multi-Faceted Integration0
Variational Stacked Local Attention Networks for Diverse Video Captioning0
Tell me what you see: A zero-shot action recognition method based on natural language descriptionsCode1
Dense Video Captioning Using Unsupervised Semantic InformationCode0
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising0
Syntax Customized Video Captioning by Imitating Exemplar SentencesCode0
Controllable Video Captioning with an Exemplar SentenceCode1
An Efficient Keyframes Selection Based Framework for Video Captioning0
Multi-modal Dependency Tree for Video Captioning0
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does MatterCode0
SwinBERT: End-to-End Transformers with Sparse Attention for Video CaptioningCode1
Hierarchical Modular Network for Video CaptioningCode1
Show:102550
← PrevPage 5 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1mPLUG-2CIDEr80Unverified
2VASTCIDEr78Unverified
3GIT2CIDEr75.9Unverified
4VLABCIDEr74.9Unverified
5COSACIDEr74.7Unverified
6VALORCIDEr74Unverified
7MaMMUT (ours)CIDEr73.6Unverified
8VideoCoCaCIDEr73.2Unverified
9RTQCIDEr69.3Unverified
10HowToCaptionCIDEr65.3Unverified
#ModelMetricClaimedVerifiedStatus
1MaMMUTCIDEr195.6Unverified
2VLABCIDEr179.8Unverified
3COSACIDEr178.5Unverified
4VALORCIDEr178.5Unverified
5mPLUG-2CIDEr165.8Unverified
6HowToCaptionCIDEr154.2Unverified
7HiTeACIDEr146.9Unverified
8Vid2SeqCIDEr146.2Unverified
9VIOLETv2CIDEr139.2Unverified
10RTQCIDEr123.4Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-418.2Unverified
2UniVL + MELTRBLEU-417.92Unverified
3UniVLBLEU-417.35Unverified
4VideoCoCaBLEU-414.2Unverified
5VLMBLEU-412.27Unverified
6E2vidD6-MASSvid-BiDBLEU-412.04Unverified
7TextKGBLEU-411.7Unverified
8COOTBLEU-411.3Unverified
9COSABLEU-410.1Unverified
10HowToCaptionBLEU-48.8Unverified
#ModelMetricClaimedVerifiedStatus
1VALORBLEU-445.6Unverified
2VASTBLEU-445Unverified
3COSABLEU-443.7Unverified
4VideoCoCaBLEU-439.7Unverified
5IcoCap (ViT-B/16)BLEU-437.4Unverified
6IcoCap (ViT-B/32)BLEU-436.9Unverified
7VASTA (Kinetics-backbone)BLEU-436.25Unverified
8CoCap (ViT/L14)BLEU-435.8Unverified
9ORG-TRLBLEU-432.1Unverified
10NITS-VCBLEU-420Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCoCaBLEU414.7Unverified
2VLTinT (ae-test split) C3D/LingBLEU414.5Unverified
3VLCap (ae-test split) - Appearance + LanguageBLEU413.38Unverified
4COOT (ae-test split) - Only Appearance featuresBLEU410.85Unverified
5MART (ae-test split) - Appearance + FlowBLEU410.33Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr49.87Unverified
2GITCIDEr32.43Unverified
3SEM-POSCIDEr26.01Unverified
4AKGNNCIDEr25.9Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr63.51Unverified
2GITCIDEr45.63Unverified
3SEM-POSCIDEr37.16Unverified
4AKGNNCIDEr35.08Unverified
#ModelMetricClaimedVerifiedStatus
1SBD_KeyframeBLEU441.01Unverified
2V+S-Att-basedBLEU436.2Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-419.9Unverified
2COSABLEU-418.8Unverified
#ModelMetricClaimedVerifiedStatus
1GVTBLEU417.7Unverified
#ModelMetricClaimedVerifiedStatus
1VNS-GRU (Cross-Lingual)BLEU-458.68Unverified
#ModelMetricClaimedVerifiedStatus
1Shot2StoryCIDEr37.4Unverified
#ModelMetricClaimedVerifiedStatus
1Vid2SeqCIDEr120.5Unverified