SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 6170 of 1149 papers

TitleStatusHype
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language ModelsCode3
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-TrainingCode3
Omni-Video: Democratizing Unified Video Understanding and GenerationCode2
LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMsCode2
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language ModelsCode2
VideoDeepResearch: Long Video Understanding With Agentic Tool UsingCode2
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data EfficiencyCode2
One Trajectory, One Token: Grounded Video Tokenization via Panoptic Sub-object TrajectoryCode2
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation ModelsCode2
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-DesignCode2
Show:102550
← PrevPage 7 of 115Next →

No leaderboard results yet.