SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 181190 of 1149 papers

TitleStatusHype
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long VideosCode1
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation LearningCode1
Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric VideosCode1
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language ModelCode1
VRoPE: Rotary Position Embedding for Video Large Language ModelsCode1
TUMTraffic-VideoQA: A Benchmark for Unified Spatio-Temporal Video Understanding in Traffic ScenesCode1
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task PerspectivesCode1
-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory ConsolidationCode1
Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual AwarenessCode1
AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual SegmentationCode1
Show:102550
← PrevPage 19 of 115Next →

No leaderboard results yet.