SOTAVerified

Video Description

The goal of automatic Video Description is to tell a story about events happening in a video. While early Video Description methods produced captions for short clips that were manually segmented to contain a single event of interest, more recently dense video captioning has been proposed to both segment distinct events in time and describe them in a series of coherent sentences. This problem is a generalization of dense image region captioning and has many practical applications, such as generating textual summaries for the visually impaired, or detecting and describing important events in surveillance footage.

Source: Joint Event Detection and Description in Continuous Video Streams

Papers

Showing 76100 of 104 papers

TitleStatusHype
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish0
Multi-Layer Content Interaction Through Quaternion Product For Visual Question Answering0
Multimodal Neural Machine Translation for Low-resource Language Pairs using Synthetic Data0
Multi-modal News Understanding with Professionally Labelled Videos (ReutersViLNews)0
Multi Sentence Description of Complex Manipulation Action Videos0
NarrationBot and InfoBot: A Hybrid System for Automated Video Description0
Natural Language Descriptions of Human Activities Scenes: Corpus Generation and Analysis0
Neural Headline Generation on Abstract Meaning Representation0
Noisy Parallel Approximate Decoding for Conditional Recurrent Language Model0
Probabilistic Soft Logic for Semantic Textual Similarity0
PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation0
JMI at SemEval 2024 Task 3: Two-step approach for multimodal ECAC using in-context learning with GPT and instruction-tuned Llama modelsCode0
Predicting Visual Features from Text for Image and Video Caption RetrievalCode0
Describing Videos by Exploiting Temporal StructureCode0
Learn to Understand Negation in Video RetrievalCode0
Describing Unseen Videos via Multi-Modal Cooperative Dialog AgentsCode0
Memory-augmented Attention Modelling for VideosCode0
TGIF: A New Dataset and Benchmark on Animated GIF DescriptionCode0
MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in IndonesianCode0
Adversarial Inference for Multi-Sentence Video DescriptionCode0
Egocentric Video Description based on Temporally-Linked SequencesCode0
Video Description using Bidirectional Recurrent Neural NetworksCode0
Edit As You Wish: Video Caption Editing with Multi-grained User ControlCode0
Implicit Location-Caption Alignment via Complementary Masking for Weakly-Supervised Dense Video CaptioningCode0
SUSTechGAN: Image Generation for Object Detection in Adverse Conditions of Autonomous DrivingCode0
Show:102550
← PrevPage 4 of 5Next →

No leaderboard results yet.