SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 401450 of 473 papers

TitleStatusHype
Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment0
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training DataCode0
Live Video CaptioningCode0
Video captioning with stacked attention and semantic hard pullCode0
Event and Entity Extraction from Generated Video CaptionsCode0
Joint Event Detection and Description in Continuous Video StreamsCode0
Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video CaptioningCode0
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal AttentionCode0
https://arxiv.org/abs/2407.00634Code0
FocusedAD: Character-centric Movie Audio DescriptionCode0
Screencast Tutorial Video UnderstandingCode0
Refined Semantic Enhancement towards Frequency Diffusion for Video CaptioningCode0
Reconstruction Network for Video CaptioningCode0
Video Summarization: Towards Entity-Aware CaptionsCode0
Sketch, Ground, and Refine: Top-Down Dense Video CaptioningCode0
Pseudo-labeling with Keyword Refining for Few-Supervised Video CaptioningCode0
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal TasksCode0
FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation FrameworkCode0
Pretrained Image-Text Models are Secretly Video CaptionersCode0
SoccerNet 2024 Challenges ResultsCode0
OSVidCap: A Framework for the Simultaneous Recognition and Description of Concurrent Actions in Videos in an Open-Set ScenarioCode0
Oracle performance for visual captioningCode0
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video CaptioningCode0
Cross-Modal Graph with Meta Concepts for Video CaptioningCode0
A Neural, Interactive-predictive System for Multimodal Sequence to Sequence TasksCode0
VG-TVP: Multimodal Procedural Planning via Visually Grounded Text-Video PromptingCode0
OmniNet: A unified architecture for multi-modal multi-task learningCode0
Cross-Modal and Hierarchical Modeling of Video and TextCode0
Excitation Backprop for RNNsCode0
Enriching Video Captions With Contextual TextCode0
StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story ContinuationCode0
End-to-End Video Captioning with Multitask Reinforcement LearningCode0
End-to-End Dense Video Captioning with Masked TransformerCode0
Effectively Leveraging CLIP for Generating Situational Summaries of Images and VideosCode0
Streamlined Dense Video CaptioningCode0
VideoBERT: A Joint Model for Video and Language Representation LearningCode0
ActBERT: Learning Global-Local Video-Text RepresentationsCode0
Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion NetworkCode0
Support-set based Multi-modal Representation Enhancement for Video CaptioningCode0
Non-Autoregressive Coarse-to-Fine Video CaptioningCode0
M-VAD Names: a Dataset for Video Captioning with NamingCode0
Syntax Customized Video Captioning by Imitating Exemplar SentencesCode0
Multi-attention Networks for Temporal Localization of Video-level LabelsCode0
Visual Transformation TellingCode0
A Survey of Video Datasets for Grounded Event UnderstandingCode0
Continual and Multi-Task Architecture SearchCode0
Accommodating Audio Modality in CLIP for Multimodal ProcessingCode0
MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie DescriptionCode0
Temporal Deformable Convolutional Encoder-Decoder Networks for Video CaptioningCode0
Contextual Explainable Video Representation: Human Perception-based UnderstandingCode0
Show:102550
← PrevPage 9 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1mPLUG-2CIDEr80Unverified
2VASTCIDEr78Unverified
3GIT2CIDEr75.9Unverified
4VLABCIDEr74.9Unverified
5COSACIDEr74.7Unverified
6VALORCIDEr74Unverified
7MaMMUT (ours)CIDEr73.6Unverified
8VideoCoCaCIDEr73.2Unverified
9RTQCIDEr69.3Unverified
10HowToCaptionCIDEr65.3Unverified
#ModelMetricClaimedVerifiedStatus
1MaMMUTCIDEr195.6Unverified
2VLABCIDEr179.8Unverified
3COSACIDEr178.5Unverified
4VALORCIDEr178.5Unverified
5mPLUG-2CIDEr165.8Unverified
6HowToCaptionCIDEr154.2Unverified
7HiTeACIDEr146.9Unverified
8Vid2SeqCIDEr146.2Unverified
9VIOLETv2CIDEr139.2Unverified
10RTQCIDEr123.4Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-418.2Unverified
2UniVL + MELTRBLEU-417.92Unverified
3UniVLBLEU-417.35Unverified
4VideoCoCaBLEU-414.2Unverified
5VLMBLEU-412.27Unverified
6E2vidD6-MASSvid-BiDBLEU-412.04Unverified
7TextKGBLEU-411.7Unverified
8COOTBLEU-411.3Unverified
9COSABLEU-410.1Unverified
10HowToCaptionBLEU-48.8Unverified
#ModelMetricClaimedVerifiedStatus
1VALORBLEU-445.6Unverified
2VASTBLEU-445Unverified
3COSABLEU-443.7Unverified
4VideoCoCaBLEU-439.7Unverified
5IcoCap (ViT-B/16)BLEU-437.4Unverified
6IcoCap (ViT-B/32)BLEU-436.9Unverified
7VASTA (Kinetics-backbone)BLEU-436.25Unverified
8CoCap (ViT/L14)BLEU-435.8Unverified
9ORG-TRLBLEU-432.1Unverified
10NITS-VCBLEU-420Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCoCaBLEU414.7Unverified
2VLTinT (ae-test split) C3D/LingBLEU414.5Unverified
3VLCap (ae-test split) - Appearance + LanguageBLEU413.38Unverified
4COOT (ae-test split) - Only Appearance featuresBLEU410.85Unverified
5MART (ae-test split) - Appearance + FlowBLEU410.33Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr49.87Unverified
2GITCIDEr32.43Unverified
3SEM-POSCIDEr26.01Unverified
4AKGNNCIDEr25.9Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr63.51Unverified
2GITCIDEr45.63Unverified
3SEM-POSCIDEr37.16Unverified
4AKGNNCIDEr35.08Unverified
#ModelMetricClaimedVerifiedStatus
1SBD_KeyframeBLEU441.01Unverified
2V+S-Att-basedBLEU436.2Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-419.9Unverified
2COSABLEU-418.8Unverified
#ModelMetricClaimedVerifiedStatus
1GVTBLEU417.7Unverified
#ModelMetricClaimedVerifiedStatus
1VNS-GRU (Cross-Lingual)BLEU-458.68Unverified
#ModelMetricClaimedVerifiedStatus
1Shot2StoryCIDEr37.4Unverified
#ModelMetricClaimedVerifiedStatus
1Vid2SeqCIDEr120.5Unverified