SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 331–340 of 1149 papers

Title	Date	Tasks	Status	Hype	Score
End-to-end Temporal Action Detection with Transformer	Jun 18, 2021	Action DetectionTemporal Action Localization	CodeCode Available	1	5
LoVR: A Benchmark for Long Video Retrieval in Multimodal Contexts	May 20, 2025	Caption GenerationRetrieval	CodeCode Available	1	5
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation	Jun 15, 2025	ObjectSemantic Segmentation	CodeCode Available	1	5
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning	Dec 4, 2024	Multimodal Large Language ModelVideo Understanding	CodeCode Available	1	5
End-to-End Streaming Video Temporal Action Segmentation with Reinforce Learning	Sep 27, 2023	Action RecognitionAction Segmentation	CodeCode Available	1	5
End-to-End Referring Video Object Segmentation with Multimodal Transformers	Nov 29, 2021	Inductive BiasInstance Segmentation	CodeCode Available	1	5
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?	Mar 27, 2022	Self-Supervised LearningSensitivity	CodeCode Available	1	5
Spherical Vision Transformer for 360-degree Video Saliency Prediction	Aug 24, 2023	PredictionSaliency Prediction	CodeCode Available	1	5
Stand-Alone Inter-Frame Attention in Video Models	Jun 14, 2022	Action ClassificationAction Recognition	CodeCode Available	1	5
CAMEL-Bench: A Comprehensive Arabic LMM Benchmark	Oct 24, 2024	document understandingVideo Understanding	CodeCode Available	1	5

Show:10 25 50

← PrevPage 34 of 115Next →

No leaderboard results yet.