SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 511520 of 1149 papers

TitleStatusHype
NoisyActions2M: A Multimedia Dataset for Video Understanding from Noisy LabelsCode0
MOFO: MOtion FOcused Self-Supervision for Video UnderstandingCode0
MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept LocalizationCode0
End-to-End Learning of Motion Representation for Video UnderstandingCode0
MINOTAUR: Multi-task Video Grounding From Multimodal QueriesCode0
METok: Multi-Stage Event-based Token Compression for Efficient Long Video UnderstandingCode0
B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal TokensCode0
EgoVLM: Policy Optimization for Egocentric Video UnderstandingCode0
Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022Code0
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric VisionCode0
Show:102550
← PrevPage 52 of 115Next →

No leaderboard results yet.