SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 131140 of 1149 papers

TitleStatusHype
InstructionBench: An Instructional Video Understanding Benchmark0
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval0
Scaling Video-Language Models to 10K Frames via Hierarchical Differential DistillationCode2
Re-thinking Temporal Search for Long-Form Video UnderstandingCode2
Moment Quantization for Video Temporal Grounding0
Aligned Better, Listen Better for Audio-Visual Large Language Models0
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding0
Slow-Fast Architecture for Video Multi-Modal Large Language ModelsCode1
SpaceR: Reinforcing MLLMs in Video Spatial ReasoningCode2
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?0
Show:102550
← PrevPage 14 of 115Next →

No leaderboard results yet.