SOTAVerified

Video Description

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

Papers

Showing 7180 of 104 papers

TitleStatusHype
Incorporating Semantic Attention in Video Description Generation0
Integrating both Visual and Audio Cues for Enhanced Video Caption0
Attend and Interact: Higher-Order Object Interactions for Video Understanding0
Predicting Visual Features from Text for Image and Video Caption RetrievalCode0
Incorporating Global Visual Features into Attention-based Neural Machine Translation.0
Task-Driven Dynamic Fusion: Reducing Ambiguity in Video Description0
Egocentric Video Description based on Temporally-Linked SequencesCode0
Attention-Based Multimodal Fusion for Video Description0
Generating Video Description using Sequence-to-sequence Model with Temporal Attention0
Hierarchical Boundary-Aware Neural Encoder for Video Captioning0
Show:102550
← PrevPage 8 of 11Next →

No leaderboard results yet.