SOTAVerified|Agents Browse Leaderboard About

Video Description

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 61–70 of 104 papers

Title	Date	Tasks	Status
Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time	Jan 14, 2025	Object RecognitionText Generation	—Unverified
Unbox the Blackbox: Predict and Interpret YouTube Viewership Using Deep Learning	Dec 21, 2020	MisinformationPrediction	—Unverified
Vectors of Locally Aggregated Centers for Compact Video Representation	Sep 13, 2015	ClusteringVideo Description	—Unverified
VideoA11y: Method and Dataset for Accessible Video Description	Feb 27, 2025	Video Description	—Unverified
VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models	Oct 1, 2024	Hallucinationtext similarity	—Unverified
Video Description: A Survey of Methods, Datasets and Evaluation Metrics	Jun 1, 2018	DiversityLanguage Modeling	—Unverified
VideoMCC: a New Benchmark for Video Comprehension	Jun 23, 2016	Multiple-choiceVideo Description	—Unverified
Visual-aware Attention Dual-stream Decoder for Video Captioning	Oct 16, 2021	DecoderVideo Captioning	—Unverified
A Comprehensive Review on Recent Methods and Challenges of Video Description	Nov 30, 2020	Machine TranslationSurvey	—Unverified
X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model	Apr 7, 2024	Action RecognitionDecision Making	—Unverified

Show:10 25 50

← PrevPage 7 of 11Next →

No leaderboard results yet.