SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 521–530 of 1149 papers

Title	Date	Tasks	Status	Hype	Score
EgoVLM: Policy Optimization for Egocentric Video Understanding	Jun 3, 2025	EgoSchemaQuestion Answering	CodeCode Available	0	5
MINOTAUR: Multi-task Video Grounding From Multimodal Queries	Feb 16, 2023	Action DetectionSentence	CodeCode Available	0	5
LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models	Aug 26, 2024	Large Language ModelVideo Quality Assessment	CodeCode Available	0	5
METok: Multi-Stage Event-based Token Compression for Efficient Long Video Understanding	Jun 3, 2025	Video Understanding	CodeCode Available	0	5
Bridging Perspectives: A Survey on Cross-view Collaborative Intelligence with Egocentric-Exocentric Vision	Jun 6, 2025	Video Understanding	CodeCode Available	0	5
LLaVA-OneVision: Easy Visual Task Transfer	Aug 6, 2024	3D Question Answering (3D-QA)	CodeCode Available	0	5
Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022	Nov 18, 2022	Object State Change ClassificationTemporal Localization	CodeCode Available	0	5
Long-Term Feature Banks for Detailed Video Understanding	Dec 12, 2018	Action ClassificationAction Recognition	CodeCode Available	0	5
A Challenge to Build Neuro-Symbolic Video Agents	May 20, 2025	Scene ClassificationVideo Retrieval	CodeCode Available	0	5
Localizing Moments in Video with Temporal Language	Sep 5, 2018	Natural Language QueriesRetrieval	CodeCode Available	0	5

Show:10 25 50

← PrevPage 53 of 115Next →

No leaderboard results yet.