SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 171180 of 1149 papers

TitleStatusHype
What can Off-the-Shelves Large Multi-Modal Models do for Dynamic Scene Graph Generation?0
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding0
Improving LLM Video Understanding with 16 Frames Per Second0
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability0
Impossible Videos0
ViSpeak: Visual Instruction Feedback in Streaming VideosCode2
VideoMind: A Chain-of-LoRA Agent for Long Video ReasoningCode3
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory0
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding0
Towards Scalable Modeling of Compressed Videos for Efficient Action Recognition0
Show:102550
← PrevPage 18 of 115Next →

No leaderboard results yet.