SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 151200 of 473 papers

TitleStatusHype
Temporal Tessellation: A Unified Approach for Video AnalysisCode0
Bidirectional Attentive Fusion with Context Gating for Dense Video CaptioningCode0
Top-down Visual Saliency Guided by CaptionsCode0
Temporal Deformable Convolutional Encoder-Decoder Networks for Video CaptioningCode0
Towards Automatic Learning of Procedures from Web Instructional VideosCode0
Visual Transformation TellingCode0
Streaming Dense Video CaptioningCode0
https://arxiv.org/abs/2407.00634Code0
Streamlined Dense Video CaptioningCode0
SoccerNet 2024 Challenges ResultsCode0
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training DataCode0
BERTHA: Video Captioning Evaluation Via Transfer-Learned Human AssessmentCode0
Support-set based Multi-modal Representation Enhancement for Video CaptioningCode0
Sketch, Ground, and Refine: Top-Down Dense Video CaptioningCode0
Event and Entity Extraction from Generated Video CaptionsCode0
Screencast Tutorial Video UnderstandingCode0
Video captioning with stacked attention and semantic hard pullCode0
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal AttentionCode0
Cross-Modal and Hierarchical Modeling of Video and TextCode0
Hierarchical Banzhaf Interaction for General Video-Language Representation LearningCode0
Reconstruction Network for Video CaptioningCode0
Pseudo-labeling with Keyword Refining for Few-Supervised Video CaptioningCode0
Refined Semantic Enhancement towards Frequency Diffusion for Video CaptioningCode0
Pretrained Image-Text Models are Secretly Video CaptionersCode0
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding TasksCode0
OSVidCap: A Framework for the Simultaneous Recognition and Description of Concurrent Actions in Videos in an Open-Set ScenarioCode0
Cross-Modal Graph with Meta Concepts for Video CaptioningCode0
ActBERT: Learning Global-Local Video-Text RepresentationsCode0
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion NetworkCode0
Oracle performance for visual captioningCode0
NMT-Keras: a Very Flexible Toolkit with a Focus on Interactive NMT and Online LearningCode0
M-VAD Names: a Dataset for Video Captioning with NamingCode0
Non-Autoregressive Coarse-to-Fine Video CaptioningCode0
Continual and Multi-Task Architecture SearchCode0
Deep Learning for Video Classification and CaptioningCode0
Contextual Explainable Video Representation: Human Perception-based UnderstandingCode0
FocusedAD: Character-centric Movie Audio DescriptionCode0
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal TasksCode0
Delving Deeper into Convolutional Networks for Learning Video RepresentationsCode0
StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story ContinuationCode0
ContCap: A scalable framework for continual image captioningCode0
MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie DescriptionCode0
A Survey of Video Datasets for Grounded Event UnderstandingCode0
Multi-attention Networks for Temporal Localization of Video-level LabelsCode0
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in IndonesianCode0
OmniNet: A unified architecture for multi-modal multi-task learningCode0
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation FrameworkCode0
A Semantics-Assisted Video Captioning Model Trained with Scheduled SamplingCode0
Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?Code0
Meaning guided video captioningCode0
Show:102550
← PrevPage 4 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1mPLUG-2CIDEr80Unverified
2VASTCIDEr78Unverified
3GIT2CIDEr75.9Unverified
4VLABCIDEr74.9Unverified
5COSACIDEr74.7Unverified
6VALORCIDEr74Unverified
7MaMMUT (ours)CIDEr73.6Unverified
8VideoCoCaCIDEr73.2Unverified
9RTQCIDEr69.3Unverified
10HowToCaptionCIDEr65.3Unverified
#ModelMetricClaimedVerifiedStatus
1MaMMUTCIDEr195.6Unverified
2VLABCIDEr179.8Unverified
3COSACIDEr178.5Unverified
4VALORCIDEr178.5Unverified
5mPLUG-2CIDEr165.8Unverified
6HowToCaptionCIDEr154.2Unverified
7HiTeACIDEr146.9Unverified
8Vid2SeqCIDEr146.2Unverified
9VIOLETv2CIDEr139.2Unverified
10RTQCIDEr123.4Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-418.2Unverified
2UniVL + MELTRBLEU-417.92Unverified
3UniVLBLEU-417.35Unverified
4VideoCoCaBLEU-414.2Unverified
5VLMBLEU-412.27Unverified
6E2vidD6-MASSvid-BiDBLEU-412.04Unverified
7TextKGBLEU-411.7Unverified
8COOTBLEU-411.3Unverified
9COSABLEU-410.1Unverified
10HowToCaptionBLEU-48.8Unverified
#ModelMetricClaimedVerifiedStatus
1VALORBLEU-445.6Unverified
2VASTBLEU-445Unverified
3COSABLEU-443.7Unverified
4VideoCoCaBLEU-439.7Unverified
5IcoCap (ViT-B/16)BLEU-437.4Unverified
6IcoCap (ViT-B/32)BLEU-436.9Unverified
7VASTA (Kinetics-backbone)BLEU-436.25Unverified
8CoCap (ViT/L14)BLEU-435.8Unverified
9ORG-TRLBLEU-432.1Unverified
10NITS-VCBLEU-420Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCoCaBLEU414.7Unverified
2VLTinT (ae-test split) C3D/LingBLEU414.5Unverified
3VLCap (ae-test split) - Appearance + LanguageBLEU413.38Unverified
4COOT (ae-test split) - Only Appearance featuresBLEU410.85Unverified
5MART (ae-test split) - Appearance + FlowBLEU410.33Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr49.87Unverified
2GITCIDEr32.43Unverified
3SEM-POSCIDEr26.01Unverified
4AKGNNCIDEr25.9Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr63.51Unverified
2GITCIDEr45.63Unverified
3SEM-POSCIDEr37.16Unverified
4AKGNNCIDEr35.08Unverified
#ModelMetricClaimedVerifiedStatus
1SBD_KeyframeBLEU441.01Unverified
2V+S-Att-basedBLEU436.2Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-419.9Unverified
2COSABLEU-418.8Unverified
#ModelMetricClaimedVerifiedStatus
1GVTBLEU417.7Unverified
#ModelMetricClaimedVerifiedStatus
1VNS-GRU (Cross-Lingual)BLEU-458.68Unverified
#ModelMetricClaimedVerifiedStatus
1Shot2StoryCIDEr37.4Unverified
#ModelMetricClaimedVerifiedStatus
1Vid2SeqCIDEr120.5Unverified