SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 201225 of 1149 papers

TitleStatusHype
Action Scene Graphs for Long-Form Understanding of Egocentric VideosCode1
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video ParsingCode1
Modeling Video As Stochastic Processes for Fine-Grained Video Representation LearningCode1
Crossover Learning for Fast Online Video Instance SegmentationCode1
Localizing Moments in Long Video Via Multimodal GuidanceCode1
Long Movie Clip Classification with State-Space Video ModelsCode1
Lightweight Network Architecture for Real-Time Action RecognitionCode1
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video RepresentationCode1
Contrastive Masked Autoencoders for Self-Supervised Video HashingCode1
Learning the Predictability of the FutureCode1
Learning Temporally Latent Causal Processes from General Temporal DataCode1
DEVIAS: Learning Disentangled Video Representations of Action and SceneCode1
Multimodal Distillation for Egocentric Action RecognitionCode1
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric VideosCode1
Learning Transferable Spatiotemporal Representations from Natural Script KnowledgeCode1
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action RecognitionCode1
Dual-path Adaptation from Image to Video TransformersCode1
Disentangle Your Dense Object DetectorCode1
Learning Temporally Causal Latent Processes from General Temporal DataCode1
DisTime: Distribution-based Time Representation for Video Large Language ModelsCode1
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action DetectionCode1
Object-Region Video TransformersCode1
Learning Optical Flow with Adaptive Graph ReasoningCode1
Language Repository for Long Video UnderstandingCode1
Learning Salient Boundary Feature for Anchor-free Temporal Action LocalizationCode1
Show:102550
← PrevPage 9 of 46Next →

No leaderboard results yet.