SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 301325 of 1149 papers

TitleStatusHype
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud VideosCode1
MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video UnderstandingCode1
Towards Visually Explaining Video Understanding Networks with PerturbationCode1
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object SegmentationCode1
ETAD: Training Action Detection End to End on a LaptopCode1
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and MitigationCode1
EPIC Fields: Marrying 3D Geometry and Video UnderstandingCode1
Long Movie Clip Classification with State-Space Video ModelsCode1
MMAD: Multi-label Micro-Action Detection in VideosCode1
MM-Pyramid: Multimodal Pyramid Attentional Network for Audio-Visual Event Localization and Video ParsingCode1
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal GroundingCode1
Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event AnalysisCode1
Localizing Moments in Long Video Via Multimodal GuidanceCode1
Enhancing Self-supervised Video Representation Learning via Multi-level Feature OptimizationCode1
F^3Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from VideosCode1
Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual AwarenessCode1
FineAction: A Fine-Grained Video Dataset for Temporal Action LocalizationCode1
A Simple and Efficient Pipeline to Build an End-to-End Spatial-Temporal Action DetectorCode1
Multimodal Long Video Modeling Based on Temporal Dynamic ContextCode1
Learning Transferable Spatiotemporal Representations from Natural Script KnowledgeCode1
Learning the Predictability of the FutureCode1
Learning Temporally Causal Latent Processes from General Temporal DataCode1
End-to-End Video Instance Segmentation with TransformersCode1
Federated Self-supervised Learning for Video UnderstandingCode1
Learning Temporally Latent Causal Processes from General Temporal DataCode1
Show:102550
← PrevPage 13 of 46Next →

No leaderboard results yet.