SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 4150 of 1149 papers

TitleStatusHype
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs0
DynTok: Dynamic Compression of Visual Tokens for Efficient and Effective Video Understanding0
METok: Multi-Stage Event-based Token Compression for Efficient Long Video UnderstandingCode0
EgoVLM: Policy Optimization for Egocentric Video UnderstandingCode0
InterRVOS: Interaction-aware Referring Video Object Segmentation0
ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding0
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data EfficiencyCode2
FlexSelect: Flexible Token Selection for Efficient Long Video Understanding0
Scene Detection Policies and Keyframe Extraction Strategies for Large-Scale Video Analysis0
SiLVR: A Simple Language-based Video Reasoning FrameworkCode1
Show:102550
← PrevPage 5 of 115Next →

No leaderboard results yet.