SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 221230 of 1149 papers

TitleStatusHype
VRoPE: Rotary Position Embedding for Video Large Language ModelsCode1
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language ModelCode1
Semantics-aware Test-time Adaptation for 3D Human Pose Estimation0
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video UnderstandingCode2
Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering0
A Survey on Mamba Architecture for Vision Applications0
Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis0
CoS: Chain-of-Shot Prompting for Long Video Understanding0
A Survey on Video Analytics in Cloud-Edge-Terminal Collaborative Systems0
Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context AccurayCode3
Show:102550
← PrevPage 23 of 115Next →

No leaderboard results yet.