SOTAVerified|Agents Browse Leaderboard About

Video Description

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 104 papers

Title	Date	Tasks	Status
Prediction and Description of Near-Future Activities in Video	Aug 2, 2019	PredictionVideo Captioning	—Unverified
A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching	Jun 1, 2013	Image DescriptionVideo Description	—Unverified
A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles	Jun 11, 2024	Sentiment AnalysisSubjectivity Analysis	—Unverified
CLearViD: Curriculum Learning for Video Description	Nov 8, 2023	DiversityVideo Description	—Unverified
Coherent Multi-Sentence Video Description with Variable Level of Detail	Mar 24, 2014	SentenceVideo Description	—Unverified
Cross-Modal Learning for Music-to-Music-Video Description Generation	Mar 14, 2025	Video DescriptionVideo Generation	—Unverified
DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description	Mar 31, 2025	Video DescriptionVideo Understanding	—Unverified
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis	Feb 11, 2025	Action RecognitionVideo Description	—Unverified
Attention-Based Multimodal Fusion for Video Description	Jan 11, 2017	DecoderSentence	—Unverified
FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning	Oct 20, 2024	DiagnosticVideo Captioning	—Unverified

Show:10 25 50

← PrevPage 3 of 11Next →

No leaderboard results yet.