SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 9511000 of 1149 papers

TitleStatusHype
Temporally-Weighted Hierarchical Clustering for Unsupervised Action SegmentationCode1
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation0
Enhancing Transformer for Video Understanding Using Gated Multi-Level Attention and Temporal Adversarial Training0
PcmNet: Position-Sensitive Context Modeling Network for Temporal Action Localization0
Unsupervised Motion Representation Enhanced Network for Action Recognition0
Win-Fail Action RecognitionCode0
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action RecognitionCode1
Is Space-Time Attention All You Need for Video Understanding?Code2
Relaxed Transformer Decoders for Direct Action Proposal GenerationCode1
Occluded Video Instance Segmentation: A BenchmarkCode1
TCLR: Temporal Contrastive Learning for Video RepresentationCode1
TrackFormer: Multi-Object Tracking with TransformersCode1
CAG-QIL: Context-Aware Actionness Grouping via Q Imitation Learning for Online Temporal Action Localization0
Attention Is Not Enough: Mitigating the Distribution Discrepancy in Asynchronous Multimodal Sequence Fusion0
Global Self-Attention Networks0
Learning Self-Similarity in Space and Time as a Generalized Motion for Action RecognitionCode1
Cross-Attentional Audio-Visual Fusion for Weakly-Supervised Action Localization0
A Comprehensive Study of Deep Video Action RecognitionCode1
Understanding Action Sequences based on Video Captioning for Learning-from-Observation0
End-to-End Video Instance Segmentation with TransformersCode1
t-EVA: Time-Efficient t-SNE Video Annotation0
SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer VideosCode1
Can Temporal Information Help with Contrastive Self-Supervised Learning?0
QuerYD: A video dataset with high-quality text and audio narrationsCode1
Cycle-Contrast for Self-Supervised Video Representation Learning0
Co-attentional Transformers for Story-Based Video Understanding0
Improved Actor Relation Graph based Group Activity RecognitionCode1
Egok360: A 360 Egocentric Kinetic Human Activity Video Dataset0
Video Action UnderstandingCode0
Global Self-Attention Networks for Image Recognition0
Features Understanding in 3D CNNs for Actions Recognition in VideoCode0
PAN: Towards Fast Action Recognition via Learning Persistence of AppearanceCode1
Residual Frames with Efficient Pseudo-3D CNN for Human Action Recognition0
The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020)Code1
Self-supervised Motion Representation via Scattering Local Motion Cues0
Uncertainty-based Traffic Accident Anticipation with Spatio-Temporal Relational LearningCode1
Detection and Localization of Robotic Tools in Robot-Assisted Surgery Videos Using Deep Neural Networks for Region Proposal and Detection0
Perceptron Synthesis Network: Rethinking the Action Scale Variances in Videos0
MovieNet: A Holistic Dataset for Movie Understanding0
MotionSqueeze: Neural Motion Feature Learning for Video UnderstandingCode1
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training0
Video Moment Localization using Object Evidence and Reverse CaptioningCode1
Actor-Context-Actor Relation Network for Spatio-Temporal Action LocalizationCode1
Video Understanding as Machine Translation0
Large Scale Video Representation Learning via Relational Graph Clustering0
Screencast Tutorial Video UnderstandingCode0
Temporal Aggregate Representations for Long-Range Video UnderstandingCode1
CARPe Posterum: A Convolutional Approach for Real-time Pedestrian Path PredictionCode0
DramaQA: Character-Centered Video Story Understanding with Hierarchical QACode0
CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning0
Show:102550
← PrevPage 20 of 23Next →

No leaderboard results yet.