SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 171180 of 1149 papers

TitleStatusHype
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action DetectorCode1
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video ParsingCode1
MOMA-LRG: Language-Refined Graphs for Multi-Object Multi-Actor Activity ParsingCode1
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action DetectionCode1
AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual SegmentationCode1
AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video UnderstandingCode1
Grounded Question-Answering in Long Egocentric VideosCode1
MECD+: Unlocking Event-Level Causal Graph Discovery for Video ReasoningCode1
HAT: History-Augmented Anchor Transformer for Online Temporal Action LocalizationCode1
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud VideosCode1
Show:102550
← PrevPage 18 of 115Next →

No leaderboard results yet.