SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 121130 of 1149 papers

TitleStatusHype
Multimodal Long Video Modeling Based on Temporal Dynamic ContextCode1
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video ReasoningCode2
F^3Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from VideosCode1
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking0
How Can Objects Help Video-Language Understanding?0
VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding0
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding0
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-TuningCode3
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models0
From Broadcast to Minimap: Achieving State-of-the-Art SoccerNet Game State Reconstruction0
Show:102550
← PrevPage 13 of 115Next →

No leaderboard results yet.