SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 131–140 of 1149 papers

Title	Date	Tasks	Status	Hype
InstructionBench: An Instructional Video Understanding Benchmark	Apr 7, 2025	Common Sense ReasoningMultiple-choice	—Unverified	0
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval	Apr 3, 2025	Information RetrievalRepresentation Learning	—Unverified	0
Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation	Apr 3, 2025	Computational EfficiencyGPU	CodeCode Available	2
Re-thinking Temporal Search for Long-Form Video Understanding	Apr 3, 2025	Computational EfficiencyForm	CodeCode Available	2
Moment Quantization for Video Temporal Grounding	Apr 3, 2025	QuantizationVideo Understanding	—Unverified	0
Aligned Better, Listen Better for Audio-Visual Large Language Models	Apr 2, 2025	Video Understanding	—Unverified	0
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding	Apr 2, 2025	Video Understanding	—Unverified	0
Slow-Fast Architecture for Video Multi-Modal Large Language Models	Apr 2, 2025	Video Understanding	CodeCode Available	1
SpaceR: Reinforcing MLLMs in Video Spatial Reasoning	Apr 2, 2025	MMESpatial Reasoning	CodeCode Available	2
Is Temporal Prompting All We Need For Limited Labeled Action Recognition?	Apr 2, 2025	Action RecognitionAll	—Unverified	0

Show:10 25 50

← PrevPage 14 of 115Next →

No leaderboard results yet.