SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 431440 of 1149 papers

TitleStatusHype
METok: Multi-Stage Event-based Token Compression for Efficient Long Video UnderstandingCode0
InterRVOS: Interaction-aware Referring Video Object Segmentation0
EgoVLM: Policy Optimization for Egocentric Video UnderstandingCode0
ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding0
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding0
Scene Detection Policies and Keyframe Extraction Strategies for Large-Scale Video Analysis0
Learning reusable concepts across different egocentric video understanding tasks0
Threading Keyframe with Narratives: MLLMs as Strong Long Video Comprehenders0
Time Blindness: Why Video-Language Models Can't See What Humans Can?0
VUDG: A Dataset for Video Understanding Domain Generalization0
Show:102550
← PrevPage 44 of 115Next →

No leaderboard results yet.