SOTAVerified|Agents Browse Leaderboard About Blog

Video Description

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 21–30 of 104 papers

Title	Date	Tasks	Status
AVD2: Accident Video Diffusion for Accident Video Description	Feb 20, 2025	Autonomous DrivingScene Understanding	—Unverified
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis	Feb 11, 2025	Action RecognitionVideo Description	—Unverified
Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time	Jan 14, 2025	Object RecognitionText Generation	—Unverified
Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video Captioning	Dec 17, 2024	Dense Video CaptioningDescriptive	CodeCode Available
PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation	Oct 30, 2024	Anomaly DetectionDescriptive	—Unverified
FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning	Oct 20, 2024	DiagnosticVideo Captioning	—Unverified
VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models	Oct 1, 2024	Hallucinationtext similarity	—Unverified
Technical Report: Competition Solution For Modelscope-Sora	Sep 24, 2024	Text-to-Video GenerationVideo Description	—Unverified
Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation	Aug 19, 2024	Instruction FollowingLarge Language Model	—Unverified
SUSTechGAN: Image Generation for Object Detection in Adverse Conditions of Autonomous Driving	Jul 18, 2024	Autonomous DrivingImage Generation	CodeCode Available

Show:10 25 50

← PrevPage 3 of 11Next →

No leaderboard results yet.