SOTAVerified|Agents Browse Leaderboard About

Video Description

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 41–50 of 104 papers

Title	Date	Tasks	Status
Attend and Interact: Higher-Order Object Interactions for Video Understanding	Nov 16, 2017	Action ClassificationAction Recognition	—Unverified
CLearViD: Curriculum Learning for Video Description	Nov 8, 2023	DiversityVideo Description	—Unverified
Prediction and Description of Near-Future Activities in Video	Aug 2, 2019	PredictionVideo Captioning	—Unverified
A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching	Jun 1, 2013	Image DescriptionVideo Description	—Unverified
A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles	Jun 11, 2024	Sentiment AnalysisSubjectivity Analysis	—Unverified
Active Learning for Video Description With Cluster-Regularized Ensemble Ranking	Jul 27, 2020	Active LearningVideo Captioning	—Unverified
FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning	Oct 20, 2024	DiagnosticVideo Captioning	—Unverified
Incorporating Background Knowledge into Video Description Generation	Oct 1, 2018	DecoderText Generation	—Unverified
Incorporating Global Visual Features into Attention-based Neural Machine Translation.	Sep 1, 2017	DecoderMachine Translation	—Unverified
HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation	Mar 31, 2025	HallucinationHuman-Object Interaction Detection	—Unverified

Show:10 25 50

← PrevPage 5 of 11Next →

No leaderboard results yet.