SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 201210 of 1149 papers

TitleStatusHype
BEARCUBS: A benchmark for computer-using web agents0
ALLVB: All-in-One Long Video Understanding Benchmark0
Towards Fine-Grained Video Question Answering0
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long VideosCode1
Unified Reward Model for Multimodal Understanding and GenerationCode4
EgoLife: Towards Egocentric Life AssistantCode3
Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection0
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation LearningCode1
HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models0
PreMind: Multi-Agent Video Understanding for Advanced Indexing of Presentation-style Videos0
Show:102550
← PrevPage 21 of 115Next →

No leaderboard results yet.