SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 126150 of 473 papers

TitleStatusHype
Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event AnalysisCode1
G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4oCode1
Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and LanguageCode1
Tell me what you see: A zero-shot action recognition method based on natural language descriptionsCode1
From Association to Generation: Text-only Captioning by Unsupervised Cross-modal MappingCode1
Unifying Event Detection and Captioning as Sequence Generation via Pre-TrainingCode1
VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph CaptioningCode1
End-to-End Video Captioning with Multitask Reinforcement LearningCode0
SoccerNet 2024 Challenges ResultsCode0
StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story ContinuationCode0
End-to-End Dense Video Captioning with Masked TransformerCode0
Sketch, Ground, and Refine: Top-Down Dense Video CaptioningCode0
Streamlined Dense Video CaptioningCode0
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal AttentionCode0
Event and Entity Extraction from Generated Video CaptionsCode0
Effectively Leveraging CLIP for Generating Situational Summaries of Images and VideosCode0
Video captioning with stacked attention and semantic hard pullCode0
Edit As You Wish: Video Caption Editing with Multi-grained User ControlCode0
ECO: Efficient Convolutional Network for Online Video UnderstandingCode0
Support-set based Multi-modal Representation Enhancement for Video CaptioningCode0
Reconstruction Network for Video CaptioningCode0
Dual-Stream Transformer for Generic Event Boundary CaptioningCode0
Refined Semantic Enhancement towards Frequency Diffusion for Video CaptioningCode0
Accommodating Audio Modality in CLIP for Multimodal ProcessingCode0
Pseudo-labeling with Keyword Refining for Few-Supervised Video CaptioningCode0
Show:102550
← PrevPage 6 of 19Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1mPLUG-2CIDEr80Unverified
2VASTCIDEr78Unverified
3GIT2CIDEr75.9Unverified
4VLABCIDEr74.9Unverified
5COSACIDEr74.7Unverified
6VALORCIDEr74Unverified
7MaMMUT (ours)CIDEr73.6Unverified
8VideoCoCaCIDEr73.2Unverified
9RTQCIDEr69.3Unverified
10HowToCaptionCIDEr65.3Unverified
#ModelMetricClaimedVerifiedStatus
1MaMMUTCIDEr195.6Unverified
2VLABCIDEr179.8Unverified
3COSACIDEr178.5Unverified
4VALORCIDEr178.5Unverified
5mPLUG-2CIDEr165.8Unverified
6HowToCaptionCIDEr154.2Unverified
7HiTeACIDEr146.9Unverified
8Vid2SeqCIDEr146.2Unverified
9VIOLETv2CIDEr139.2Unverified
10RTQCIDEr123.4Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-418.2Unverified
2UniVL + MELTRBLEU-417.92Unverified
3UniVLBLEU-417.35Unverified
4VideoCoCaBLEU-414.2Unverified
5VLMBLEU-412.27Unverified
6E2vidD6-MASSvid-BiDBLEU-412.04Unverified
7TextKGBLEU-411.7Unverified
8COOTBLEU-411.3Unverified
9COSABLEU-410.1Unverified
10HowToCaptionBLEU-48.8Unverified
#ModelMetricClaimedVerifiedStatus
1VALORBLEU-445.6Unverified
2VASTBLEU-445Unverified
3COSABLEU-443.7Unverified
4VideoCoCaBLEU-439.7Unverified
5IcoCap (ViT-B/16)BLEU-437.4Unverified
6IcoCap (ViT-B/32)BLEU-436.9Unverified
7VASTA (Kinetics-backbone)BLEU-436.25Unverified
8CoCap (ViT/L14)BLEU-435.8Unverified
9ORG-TRLBLEU-432.1Unverified
10NITS-VCBLEU-420Unverified
#ModelMetricClaimedVerifiedStatus
1VideoCoCaBLEU414.7Unverified
2VLTinT (ae-test split) C3D/LingBLEU414.5Unverified
3VLCap (ae-test split) - Appearance + LanguageBLEU413.38Unverified
4COOT (ae-test split) - Only Appearance featuresBLEU410.85Unverified
5MART (ae-test split) - Appearance + FlowBLEU410.33Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr49.87Unverified
2GITCIDEr32.43Unverified
3SEM-POSCIDEr26.01Unverified
4AKGNNCIDEr25.9Unverified
#ModelMetricClaimedVerifiedStatus
1CENCIDEr63.51Unverified
2GITCIDEr45.63Unverified
3SEM-POSCIDEr37.16Unverified
4AKGNNCIDEr35.08Unverified
#ModelMetricClaimedVerifiedStatus
1SBD_KeyframeBLEU441.01Unverified
2V+S-Att-basedBLEU436.2Unverified
#ModelMetricClaimedVerifiedStatus
1VASTBLEU-419.9Unverified
2COSABLEU-418.8Unverified
#ModelMetricClaimedVerifiedStatus
1GVTBLEU417.7Unverified
#ModelMetricClaimedVerifiedStatus
1VNS-GRU (Cross-Lingual)BLEU-458.68Unverified
#ModelMetricClaimedVerifiedStatus
1Shot2StoryCIDEr37.4Unverified
#ModelMetricClaimedVerifiedStatus
1Vid2SeqCIDEr120.5Unverified