SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 851875 of 1149 papers

TitleStatusHype
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary DetectionCode1
Auto-X3D: Ultra-Efficient Video Understanding via Finer-Grained Neural Architecture Search0
Prompting Visual-Language Models for Efficient Video UnderstandingCode1
Suppressing Static Visual Cues via Normalizing Flows for Self-Supervised Video Representation LearningCode0
Stacked Temporal Attention: Improving First-person Action Recognition by Emphasizing Discriminative Clips0
TokenLearner: Adaptive Space-Time Tokenization for VideosCode1
LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering0
UBoCo : Unsupervised Boundary Contrastive Learning for Generic Event Boundary Detection0
End-to-End Referring Video Object Segmentation with Multimodal TransformersCode1
SwinBERT: End-to-End Transformers with Sparse Attention for Video CaptioningCode1
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token ModelingCode1
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video ParsingCode1
PyTorchVideo: A Deep Learning Library for Video UnderstandingCode2
Fill-in-the-Blank: A Challenging Video Understanding Evaluation Framework0
Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge0
Attention Mechanisms in Computer Vision: A SurveyCode2
Relational Self-Attention: What's Missing in Attention for Video UnderstandingCode1
Revisiting spatio-temporal layouts for compositional action recognitionCode1
Re-ID-AR: Improved Person Re-identification in Video via Joint Weakly Supervised Action RecognitionCode0
Gradient Frequency Modulation for Visually Explaining Video Understanding Models0
Distantly Supervised Semantic Text Detection and Recognition for Broadcast Sports Videos Understanding0
Can't Fool Me: Adversarially Robust Transformer for Video Understanding0
Leveraging Local Temporal Information for Multimodal Scene Classification0
Benchmarking the Robustness of Spatial-Temporal Models Against CorruptionsCode1
CLIP4Caption: CLIP for Video Caption0
Show:102550
← PrevPage 35 of 46Next →

No leaderboard results yet.