SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 110 of 473 papers

TitleStatusHype
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New BenchmarksCode1
Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization0
Dense Video Captioning using Graph-based Sentence Summarization0
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language ModelsCode2
VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks0
ARGUS: Hallucination and Omission Evaluation in Video-LLMs0
Temporal Object Captioning for Street Scene Videos from LiDAR Tracks0
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal TasksCode0
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation0
Describe Anything: Detailed Localized Image and Video Captioning0
Show:102550
← PrevPage 1 of 48Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VALORBLEU-445.6Unverified
2VASTBLEU-445Unverified
3COSABLEU-443.7Unverified
4VideoCoCaBLEU-439.7Unverified
5IcoCap (ViT-B/16)BLEU-437.4Unverified
6IcoCap (ViT-B/32)BLEU-436.9Unverified
7VASTA (Kinetics-backbone)BLEU-436.25Unverified
8CoCap (ViT/L14)BLEU-435.8Unverified
9ORG-TRLBLEU-432.1Unverified
10NITS-VCBLEU-420Unverified