SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 161170 of 1149 papers

TitleStatusHype
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud VideosCode1
Actor-Context-Actor Relation Network for Spatio-Temporal Action LocalizationCode1
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object SegmentationCode1
A Multigrid Method for Efficiently Training Video ModelsCode1
HAT: History-Augmented Anchor Transformer for Online Temporal Action LocalizationCode1
Benchmarking the Robustness of Spatial-Temporal Models Against CorruptionsCode1
Grounded Question-Answering in Long Egocentric VideosCode1
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video UnderstandingCode1
Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space ModelsCode1
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action DetectorCode1
Show:102550
← PrevPage 17 of 115Next →

No leaderboard results yet.