SOTAVerified

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Showing 110 of 473 papers

TitleStatusHype
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New BenchmarksCode1
Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization0
Dense Video Captioning using Graph-based Sentence Summarization0
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language ModelsCode2
VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks0
ARGUS: Hallucination and Omission Evaluation in Video-LLMs0
Temporal Object Captioning for Street Scene Videos from LiDAR Tracks0
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal TasksCode0
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation0
Describe Anything: Detailed Localized Image and Video Captioning0
Show:102550
← PrevPage 1 of 48Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VASTBLEU-418.2Unverified
2UniVL + MELTRBLEU-417.92Unverified
3UniVLBLEU-417.35Unverified
4VideoCoCaBLEU-414.2Unverified
5VLMBLEU-412.27Unverified
6E2vidD6-MASSvid-BiDBLEU-412.04Unverified
7TextKGBLEU-411.7Unverified
8COOTBLEU-411.3Unverified
9COSABLEU-410.1Unverified
10HowToCaptionBLEU-48.8Unverified