SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 110 of 473 papers

TitleStatusHype
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New BenchmarksCode1
Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization0
Dense Video Captioning using Graph-based Sentence Summarization0
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language ModelsCode2
VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks0
ARGUS: Hallucination and Omission Evaluation in Video-LLMs0
Temporal Object Captioning for Street Scene Videos from LiDAR Tracks0
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal TasksCode0
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation0
Describe Anything: Detailed Localized Image and Video Captioning0
Show:102550
← PrevPage 1 of 48Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GVTBLEU417.7Unverified