SOTAVerified|Agents Browse Leaderboard About Blog

Video Captioning

Video Captioning is a task of automatic captioning a video by understanding the action and event in the video which can help in the retrieval of the video efficiently through text.

Source: NITS-VC System for VATEX Video Captioning Challenge 2020

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 473 papers

Title	Date	Tasks	Status	Hype
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks	Jul 15, 2025	Video CaptioningVideo Understanding	CodeCode Available	1
Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization	Jun 25, 2025	Dense Video CaptioningDescriptive	—Unverified	0
Dense Video Captioning using Graph-based Sentence Summarization	Jun 25, 2025	Dense Video CaptioningSentence	—Unverified	0
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models	Jun 18, 2025	Audio captioningLarge Language Model	CodeCode Available	2
VersaVid-R1: A Versatile Video Understanding and Reasoning Model from Question Answering to Captioning Tasks	Jun 10, 2025	Multiple-choiceOpen-Ended Question Answering	—Unverified	0
ARGUS: Hallucination and Omission Evaluation in Video-LLMs	Jun 9, 2025	DescriptiveForm	—Unverified	0
Temporal Object Captioning for Street Scene Videos from LiDAR Tracks	May 22, 2025	Caption GenerationVideo Captioning	—Unverified	0
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks	May 19, 2025	Video Captioning	CodeCode Available	0
TimeSoccer: An End-to-End Multimodal Large Language Model for Soccer Commentary Generation	Apr 24, 2025	Caption GenerationDense Video Captioning	—Unverified	0
Describe Anything: Detailed Localized Image and Video Captioning	Apr 22, 2025	SentenceVideo Captioning	—Unverified	0

Show:10 25 50

← PrevPage 1 of 48Next →

All datasets MSR-VTT MSVD YouCook2 VATEX ActivityNet Captions MSRVTT-CTN MSVD-CTN Hindi MSR-VTT TVC ChinaOpen-1k MSVD-Indonesian Shot2Story20K

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	VAST	BLEU-4	18.2	—	Unverified
2	UniVL + MELTR	BLEU-4	17.92	—	Unverified
3	UniVL	BLEU-4	17.35	—	Unverified
4	VideoCoCa	BLEU-4	14.2	—	Unverified
5	VLM	BLEU-4	12.27	—	Unverified
6	E2vidD6-MASSvid-BiD	BLEU-4	12.04	—	Unverified
7	TextKG	BLEU-4	11.7	—	Unverified
8	COOT	BLEU-4	11.3	—	Unverified
9	COSA	BLEU-4	10.1	—	Unverified
10	HowToCaption	BLEU-4	8.8	—	Unverified