SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 111–120 of 1149 papers

Title	Date	Tasks	Status	Hype	Score
OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer	Jun 24, 2024	AI AgentLarge Language Model	CodeCode Available	2	5
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	Oct 25, 2024	EgoSchemaHallucination	CodeCode Available	2	5
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning	Apr 13, 2025	Question Answeringreinforcement-learning	CodeCode Available	2	5
Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs	Oct 14, 2024	Computational EfficiencyQuestion Answering	CodeCode Available	2	5
Online Video Understanding: OVBench and VideoChat-Online	Dec 31, 2024	Autonomous DrivingQuestion Answering	CodeCode Available	2	5
PyTorchVideo: A Deep Learning Library for Video Understanding	Nov 18, 2021	Deep LearningSelf-Supervised Learning	CodeCode Available	2	5
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer	Sep 22, 2022	Action ClassificationAction Recognition	CodeCode Available	2	5
Beyond MOT: Semantic Multi-Object Tracking	Mar 8, 2024	Multi-Object TrackingObject	CodeCode Available	2	5
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding	Nov 14, 2023	Image-based Generative Performance BenchmarkingLanguage Modeling	CodeCode Available	2	5
A Content-Driven Micro-Video Recommendation Dataset at Scale	Sep 27, 2023	BenchmarkingRecommendation Systems	CodeCode Available	2	5

Show:10 25 50

← PrevPage 12 of 115Next →

No leaderboard results yet.