SOTAVerified

Video Description

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

Papers

Showing 6170 of 104 papers

TitleStatusHype
Incorporating Background Knowledge into Video Description Generation0
A Dataset for Telling the Stories of Social Media Videos0
Attentive Sequence to Sequence Translation for Localizing Clips of Interest by Natural Language Descriptions0
Bridge Video and Text with Cascade Syntactic Structure0
Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data0
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video FeaturesCode0
Interpretable Video Captioning via Trajectory Structured Localization0
Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7Code1
Video Description: A Survey of Methods, Datasets and Evaluation Metrics0
Incorporating Semantic Attention in Video Description Generation0
Show:102550
← PrevPage 7 of 11Next →

No leaderboard results yet.