SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 151175 of 1149 papers

TitleStatusHype
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video UnderstandingCode1
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action DetectorCode1
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video ParsingCode1
MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video UnderstandingCode1
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud VideosCode1
MECD+: Unlocking Event-Level Causal Graph Discovery for Video ReasoningCode1
Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space ModelsCode1
A Multi-Person Video Dataset Annotation Method of Spatio-Temporally ActionsCode1
MammAlps: A multi-view video behavior monitoring dataset of wild mammals in the Swiss AlpsCode1
MH-DETR: Video Moment and Highlight Detection with Cross-modal TransformerCode1
MM-VID: Advancing Video Understanding with GPT-4V(ision)Code1
Actor-Context-Actor Relation Network for Spatio-Temporal Action LocalizationCode1
A Multigrid Method for Efficiently Training Video ModelsCode1
LoVR: A Benchmark for Long Video Retrieval in Multimodal ContextsCode1
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object SegmentationCode1
Benchmarking the Robustness of Spatial-Temporal Models Against CorruptionsCode1
Long Movie Clip Classification with State-Space Video ModelsCode1
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action DetectionCode1
Lightweight Network Architecture for Real-Time Action RecognitionCode1
AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual SegmentationCode1
AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video UnderstandingCode1
Leveraging triplet loss for unsupervised action segmentationCode1
Learning the Predictability of the FutureCode1
Learning Temporally Latent Causal Processes from General Temporal DataCode1
Learning Transferable Spatiotemporal Representations from Natural Script KnowledgeCode1
Show:102550
← PrevPage 7 of 46Next →

No leaderboard results yet.