SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 226250 of 1149 papers

TitleStatusHype
Do Language Models Understand Time?Code1
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud VideosCode1
Localizing Moments in Long Video Via Multimodal GuidanceCode1
Long Movie Clip Classification with State-Space Video ModelsCode1
Lightweight Network Architecture for Real-Time Action RecognitionCode1
Learning Transferable Spatiotemporal Representations from Natural Script KnowledgeCode1
Learning the Predictability of the FutureCode1
Procedure-Aware Pretraining for Instructional Video UnderstandingCode1
Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary DetectionCode1
Learning Temporally Latent Causal Processes from General Temporal DataCode1
Leveraging triplet loss for unsupervised action segmentationCode1
MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video UnderstandingCode1
Event-Free Moving Object Segmentation from Moving Ego VehicleCode1
A Multi-Person Video Dataset Annotation Method of Spatio-Temporally ActionsCode1
Dual-path Adaptation from Image to Video TransformersCode1
Learning Optical Flow with Adaptive Graph ReasoningCode1
Relational Self-Attention: What's Missing in Attention for Video UnderstandingCode1
Learning Salient Boundary Feature for Anchor-free Temporal Action LocalizationCode1
REVECA -- Rich Encoder-decoder framework for Video Event CAptionerCode1
Language-Guided Audio-Visual Learning for Long-Term Sports AssessmentCode1
Compositional Video Understanding with Spatiotemporal Structure-based TransformersCode1
Language Repository for Long Video UnderstandingCode1
Learning Self-Similarity in Space and Time as a Generalized Motion for Action RecognitionCode1
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video UnderstandingCode1
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMsCode1
Show:102550
← PrevPage 10 of 46Next →

No leaderboard results yet.