SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 201250 of 473 papers

TitleStatusHype
Attention Based Encoder Decoder Model for Video Captioning in Nepali (2023)0
Attention based video captioning framework for Hindi0
Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers0
Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information0
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training0
Automatic Generation of Descriptive Titles for Video Clips Using Deep Learning0
Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding0
Best Vision Technologies Submission to ActivityNet Challenge 2018-Task: Dense-Captioning Events in Videos0
Beyond Caption To Narrative: Video Captioning With Multiple Sentences0
Bidirectional Long-Short Term Memory for Video Description0
Bidirectional Multirate Reconstruction for Temporal Modeling in Videos0
Boosting Video Captioning with Dynamic Loss Network0
Boosting Video Representation Learning with Multi-Faceted Integration0
Boosting Video-Text Retrieval with Explicit High-Level Semantics0
Bridge Video and Text with Cascade Syntactic Structure0
Bridging Vision and Language: Modeling Causality and Temporality in Video Narratives0
FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning0
Prediction and Description of Near-Future Activities in Video0
Capturing Rich Behavior Representations: A Dynamic Action Semantic-Aware Graph Transformer for Video Captioning0
Characterizing the impact of using features extracted from pre-trained models on the quality of video captioning sequence-to-sequence models0
Chinese Whispers: Cooperative Paraphrase Acquisition0
Classifier-Guided Captioning Across Modalities0
CLIP4Caption: CLIP for Video Caption0
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising0
Collaborative Three-Stream Transformers for Video Captioning0
Consensus-based Sequence Training for Video Captioning0
Learning Video Representations using Contrastive Bidirectional Transformer0
CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation0
CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation0
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations0
Crowd Video Captioning0
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning0
Deep Reinforcement Learning for NLP0
Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols0
Dense Video Captioning using Graph-based Sentence Summarization0
Describe Anything: Detailed Localized Image and Video Captioning0
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement0
Directed Domain Fine-Tuning: Tailoring Separate Modalities for Specific Training Tasks0
Discourse Analysis for Evaluating Coherence in Video Paragraph Captions0
Diverse Video Captioning Through Latent Variable Expansion0
Dual-Level Decoupled Transformer for Video Captioning0
DVCFlow: Modeling Information Flow Towards Human-like Video Captioning0
E-MMAD: Multimodal Advertising Caption Generation Based on Structured Information0
Empirical Autopsy of Deep Video Captioning Frameworks0
Encoder-Decoder Based Long Short-Term Memory (LSTM) Model for Video Captioning0
End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering0
End-to-end Dense Video Captioning as Sequence Generation0
End-to-end Dense Video Captioning as Sequence Generation0
End-to-end Generative Pretraining for Multimodal Video Captioning0
Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization0
Show:102550
← PrevPage 5 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1mPLUG-2CIDEr80Unverified
2VASTCIDEr78Unverified
3GIT2CIDEr75.9Unverified
4VLABCIDEr74.9Unverified
5COSACIDEr74.7Unverified
6VALORCIDEr74Unverified
7MaMMUT (ours)CIDEr73.6Unverified
8VideoCoCaCIDEr73.2Unverified
9RTQCIDEr69.3Unverified
10HowToCaptionCIDEr65.3Unverified
#ModelMetricClaimedVerifiedStatus
1MaMMUTCIDEr195.6Unverified
2VLABCIDEr179.8Unverified
3COSACIDEr178.5Unverified
4VALORCIDEr178.5Unverified
5mPLUG-2CIDEr165.8Unverified
6HowToCaptionCIDEr154.2Unverified
7HiTeACIDEr146.9Unverified
8Vid2SeqCIDEr146.2Unverified
9VIOLETv2CIDEr139.2Unverified
10RTQCIDEr123.4Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-418.2Unverified
2UniVL + MELTRBLEU-417.92Unverified
3UniVLBLEU-417.35Unverified
4VideoCoCaBLEU-414.2Unverified
5VLMBLEU-412.27Unverified
6E2vidD6-MASSvid-BiDBLEU-412.04Unverified
7TextKGBLEU-411.7Unverified
8COOTBLEU-411.3Unverified
9COSABLEU-410.1Unverified
10HowToCaptionBLEU-48.8Unverified
#ModelMetricClaimedVerifiedStatus
1VALORBLEU-445.6Unverified
2VASTBLEU-445Unverified
3COSABLEU-443.7Unverified
4VideoCoCaBLEU-439.7Unverified
5IcoCap (ViT-B/16)BLEU-437.4Unverified
6IcoCap (ViT-B/32)BLEU-436.9Unverified
7VASTA (Kinetics-backbone)BLEU-436.25Unverified
8CoCap (ViT/L14)BLEU-435.8Unverified
9ORG-TRLBLEU-432.1Unverified
10NITS-VCBLEU-420Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCoCaBLEU414.7Unverified
2VLTinT (ae-test split) C3D/LingBLEU414.5Unverified
3VLCap (ae-test split) - Appearance + LanguageBLEU413.38Unverified
4COOT (ae-test split) - Only Appearance featuresBLEU410.85Unverified
5MART (ae-test split) - Appearance + FlowBLEU410.33Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr49.87Unverified
2GITCIDEr32.43Unverified
3SEM-POSCIDEr26.01Unverified
4AKGNNCIDEr25.9Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr63.51Unverified
2GITCIDEr45.63Unverified
3SEM-POSCIDEr37.16Unverified
4AKGNNCIDEr35.08Unverified
#ModelMetricClaimedVerifiedStatus
1SBD_KeyframeBLEU441.01Unverified
2V+S-Att-basedBLEU436.2Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-419.9Unverified
2COSABLEU-418.8Unverified
#ModelMetricClaimedVerifiedStatus
1GVTBLEU417.7Unverified
#ModelMetricClaimedVerifiedStatus
1VNS-GRU (Cross-Lingual)BLEU-458.68Unverified
#ModelMetricClaimedVerifiedStatus
1Shot2StoryCIDEr37.4Unverified
#ModelMetricClaimedVerifiedStatus
1Vid2SeqCIDEr120.5Unverified