SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 326350 of 1149 papers

TitleStatusHype
Streaming Video Temporal Action Segmentation In Real TimeCode1
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video UnderstandingCode1
End-to-end Temporal Action Detection with TransformerCode1
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
End-to-End Streaming Video Temporal Action Segmentation with Reinforce LearningCode1
FineAction: A Fine-Grained Video Dataset for Temporal Action LocalizationCode1
Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space ModelsCode1
MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video UnderstandingCode1
Masked Spatio-Temporal Structure Prediction for Self-supervised Learning on Point Cloud VideosCode1
End-to-End Referring Video Object Segmentation with Multimodal TransformersCode1
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsCode1
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesCode1
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary InvestigationCode1
CAMEL-Bench: A Comprehensive Arabic LMM BenchmarkCode1
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video UnderstandingCode1
MMAD: Multi-label Micro-Action Detection in VideosCode1
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?Code1
Compositional Video Understanding with Spatiotemporal Structure-based TransformersCode1
An overview on the evaluated video retrieval tasks at TRECVID 2022Code1
Stochastic Image-to-Video Synthesis using cINNsCode1
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task PerspectivesCode1
Large Scale Holistic Video UnderstandingCode1
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation LearningCode1
FrameExit: Conditional Early Exiting for Efficient Video RecognitionCode1
A Comprehensive Study of Deep Video Action RecognitionCode1
Show:102550
← PrevPage 14 of 46Next →

No leaderboard results yet.