SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 951975 of 1149 papers

TitleStatusHype
Temporally-Weighted Hierarchical Clustering for Unsupervised Action SegmentationCode1
ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation0
Enhancing Transformer for Video Understanding Using Gated Multi-Level Attention and Temporal Adversarial Training0
PcmNet: Position-Sensitive Context Modeling Network for Temporal Action Localization0
Unsupervised Motion Representation Enhanced Network for Action Recognition0
Win-Fail Action RecognitionCode0
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action RecognitionCode1
Is Space-Time Attention All You Need for Video Understanding?Code2
Relaxed Transformer Decoders for Direct Action Proposal GenerationCode1
Occluded Video Instance Segmentation: A BenchmarkCode1
TCLR: Temporal Contrastive Learning for Video RepresentationCode1
TrackFormer: Multi-Object Tracking with TransformersCode1
CAG-QIL: Context-Aware Actionness Grouping via Q Imitation Learning for Online Temporal Action Localization0
Attention Is Not Enough: Mitigating the Distribution Discrepancy in Asynchronous Multimodal Sequence Fusion0
Global Self-Attention Networks0
Learning Self-Similarity in Space and Time as a Generalized Motion for Action RecognitionCode1
Cross-Attentional Audio-Visual Fusion for Weakly-Supervised Action Localization0
A Comprehensive Study of Deep Video Action RecognitionCode1
Understanding Action Sequences based on Video Captioning for Learning-from-Observation0
End-to-End Video Instance Segmentation with TransformersCode1
t-EVA: Time-Efficient t-SNE Video Annotation0
SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer VideosCode1
Can Temporal Information Help with Contrastive Self-Supervised Learning?0
QuerYD: A video dataset with high-quality text and audio narrationsCode1
Cycle-Contrast for Self-Supervised Video Representation Learning0
Show:102550
← PrevPage 39 of 46Next →

No leaderboard results yet.