SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 201225 of 1149 papers

TitleStatusHype
BEARCUBS: A benchmark for computer-using web agents0
ALLVB: All-in-One Long Video Understanding Benchmark0
Towards Fine-Grained Video Question Answering0
TimeLoc: A Unified End-to-End Framework for Precise Timestamp Localization in Long VideosCode1
Unified Reward Model for Multimodal Understanding and GenerationCode4
EgoLife: Towards Egocentric Life AssistantCode3
Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection0
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation LearningCode1
HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models0
PreMind: Multi-Agent Video Understanding for Advanced Indexing of Presentation-style Videos0
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action DetectionCode3
M-LLM Based Video Frame Selection for Efficient Video Understanding0
InternVQA: Advancing Compressed Video Quality Assessment with Distilling Large Foundation Model0
An Analysis of Data Transformation Effects on Segment Anything 20
Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric VideosCode1
Fine-Grained Video Captioning through Scene Graph Consolidation0
LongCaptioning: Unlocking the Power of Long Caption Generation in Large Multimodal Models0
AVD2: Accident Video Diffusion for Accident Video Description0
MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval0
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language ModelCode1
VRoPE: Rotary Position Embedding for Video Large Language ModelsCode1
iMOVE: Instance-Motion-Aware Video Understanding0
Semantics-aware Test-time Adaptation for 3D Human Pose Estimation0
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video UnderstandingCode2
Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering0
Show:102550
← PrevPage 9 of 46Next →

No leaderboard results yet.