SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 481490 of 1149 papers

TitleStatusHype
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild0
OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding0
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model0
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking0
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding0
How Can Objects Help Video-Language Understanding?0
VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding0
From Broadcast to Minimap: Achieving State-of-the-Art SoccerNet Game State Reconstruction0
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models0
InstructionBench: An Instructional Video Understanding Benchmark0
Show:102550
← PrevPage 49 of 115Next →

No leaderboard results yet.