SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 301350 of 473 papers

TitleStatusHype
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement0
PolySmart @ TRECVid 2024 Video Captioning (VTT)0
Describe Anything: Detailed Localized Image and Video Captioning0
End-to-End Video Captioning0
Pre-training for Video Captioning Challenge 2020 Summary0
Procedural Text Generation from an Execution Video0
Progress-Aware Video Frame Captioning0
Dense Video Captioning using Graph-based Sentence Summarization0
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment0
Recent Advances in Video Question Answering: A Review of Datasets and Methods0
Recipe Generation from Unsegmented Cooking Videos0
Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning0
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending0
Recurrent Memory Addressing for describing videos0
Reexamining Racial Disparities in Automatic Speech Recognition Performance: The Role of Confounding by Provenance0
An Efficient Keyframes Selection Based Framework for Video Captioning0
ReGen: A good Generative Zero-Shot Video Classifier Should be Rewarded0
Reinforced Video Captioning with Entailment Rewards0
Relational Reasoning using Prior Knowledge for Visual Captioning0
Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols0
Retrieval-Augmented Egocentric Video Captioning0
RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning0
Deep Reinforcement Learning for NLP0
RUC+CMU: System Report for Dense Captioning Events in Videos0
SACT: Self-Aware Multi-Space Feature Composition Transformer for Multinomial Attention for Video Captioning0
SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning with Human Object Interactions0
SBAT: Video Captioning with Sparse Boundary-Aware Transformer0
Scalable and Accurate Self-supervised Multimodal Representation Learning without Aligned Video and Text Data0
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding0
An Attempt towards Interpretable Audio-Visual Video Captioning0
Semantic-Aware Pretraining for Dense Video Captioning0
CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning0
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks0
Semi-Supervised Learning for Video Captioning0
SEM-POS: Grammatically and Semantically Correct Video Captioning0
Amortized Context Vector Inference for Sequence-to-Sequence Networks0
Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding0
Set Prediction Guided by Semantic Concepts for Diverse Video Captioning0
Crowd Video Captioning0
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations0
CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation0
Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization0
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning0
CREATE: A Benchmark for Chinese Short Video Retrieval and Title Generation0
SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability0
SnapCap: Efficient Snapshot Compressive Video Captioning0
Learning Video Representations using Contrastive Bidirectional Transformer0
Agent-based Video Trimming0
Consensus-based Sequence Training for Video Captioning0
Collaborative Three-Stream Transformers for Video Captioning0
Show:102550
← PrevPage 7 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1mPLUG-2CIDEr80Unverified
2VASTCIDEr78Unverified
3GIT2CIDEr75.9Unverified
4VLABCIDEr74.9Unverified
5COSACIDEr74.7Unverified
6VALORCIDEr74Unverified
7MaMMUT (ours)CIDEr73.6Unverified
8VideoCoCaCIDEr73.2Unverified
9RTQCIDEr69.3Unverified
10HowToCaptionCIDEr65.3Unverified
#ModelMetricClaimedVerifiedStatus
1MaMMUTCIDEr195.6Unverified
2VLABCIDEr179.8Unverified
3COSACIDEr178.5Unverified
4VALORCIDEr178.5Unverified
5mPLUG-2CIDEr165.8Unverified
6HowToCaptionCIDEr154.2Unverified
7HiTeACIDEr146.9Unverified
8Vid2SeqCIDEr146.2Unverified
9VIOLETv2CIDEr139.2Unverified
10RTQCIDEr123.4Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-418.2Unverified
2UniVL + MELTRBLEU-417.92Unverified
3UniVLBLEU-417.35Unverified
4VideoCoCaBLEU-414.2Unverified
5VLMBLEU-412.27Unverified
6E2vidD6-MASSvid-BiDBLEU-412.04Unverified
7TextKGBLEU-411.7Unverified
8COOTBLEU-411.3Unverified
9COSABLEU-410.1Unverified
10HowToCaptionBLEU-48.8Unverified
#ModelMetricClaimedVerifiedStatus
1VALORBLEU-445.6Unverified
2VASTBLEU-445Unverified
3COSABLEU-443.7Unverified
4VideoCoCaBLEU-439.7Unverified
5IcoCap (ViT-B/16)BLEU-437.4Unverified
6IcoCap (ViT-B/32)BLEU-436.9Unverified
7VASTA (Kinetics-backbone)BLEU-436.25Unverified
8CoCap (ViT/L14)BLEU-435.8Unverified
9ORG-TRLBLEU-432.1Unverified
10NITS-VCBLEU-420Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCoCaBLEU414.7Unverified
2VLTinT (ae-test split) C3D/LingBLEU414.5Unverified
3VLCap (ae-test split) - Appearance + LanguageBLEU413.38Unverified
4COOT (ae-test split) - Only Appearance featuresBLEU410.85Unverified
5MART (ae-test split) - Appearance + FlowBLEU410.33Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr49.87Unverified
2GITCIDEr32.43Unverified
3SEM-POSCIDEr26.01Unverified
4AKGNNCIDEr25.9Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr63.51Unverified
2GITCIDEr45.63Unverified
3SEM-POSCIDEr37.16Unverified
4AKGNNCIDEr35.08Unverified
#ModelMetricClaimedVerifiedStatus
1SBD_KeyframeBLEU441.01Unverified
2V+S-Att-basedBLEU436.2Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-419.9Unverified
2COSABLEU-418.8Unverified
#ModelMetricClaimedVerifiedStatus
1GVTBLEU417.7Unverified
#ModelMetricClaimedVerifiedStatus
1VNS-GRU (Cross-Lingual)BLEU-458.68Unverified
#ModelMetricClaimedVerifiedStatus
1Shot2StoryCIDEr37.4Unverified
#ModelMetricClaimedVerifiedStatus
1Vid2SeqCIDEr120.5Unverified