SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 176200 of 1149 papers

TitleStatusHype
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action DetectorCode1
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual ActionsCode1
MH-DETR: Video Moment and Highlight Detection with Cross-modal TransformerCode1
MMAD: Multi-label Micro-Action Detection in VideosCode1
AutoVideo: An Automated Video Action Recognition SystemCode1
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud VideosCode1
Action Scene Graphs for Long-Form Understanding of Egocentric VideosCode1
Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space ModelsCode1
MammAlps: A multi-view video behavior monitoring dataset of wild mammals in the Swiss AlpsCode1
MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video UnderstandingCode1
CyberV: Cybernetics for Test-time Scaling in Video UnderstandingCode1
Learning Video Context as Interleaved Multimodal SequencesCode1
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object SegmentationCode1
Agentic Keyframe Search for Video Question AnsweringCode1
Long Movie Clip Classification with State-Space Video ModelsCode1
MECD+: Unlocking Event-Level Causal Graph Discovery for Video ReasoningCode1
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation LearningCode1
Crossover Learning for Fast Online Video Instance SegmentationCode1
Learning Transferable Spatiotemporal Representations from Natural Script KnowledgeCode1
Learning Temporally Latent Causal Processes from General Temporal DataCode1
Learning Temporally Causal Latent Processes from General Temporal DataCode1
Learning the Predictability of the FutureCode1
LoVR: A Benchmark for Long Video Retrieval in Multimodal ContextsCode1
Leveraging triplet loss for unsupervised action segmentationCode1
Learning Salient Boundary Feature for Anchor-free Temporal Action LocalizationCode1
Show:102550
← PrevPage 8 of 46Next →

No leaderboard results yet.