SOTAVerified|Agents Browse Leaderboard About

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 111–120 of 1149 papers

Title	Date	Tasks	Status	Hype
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video Understanding	Apr 20, 2025	Autonomous DrivingImage Captioning	CodeCode Available	0
How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?	Apr 19, 2025	Video Understanding	—Unverified	0
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval	Apr 17, 2025	Partially Relevant Video RetrievalRetrieval	—Unverified	0
Perception Encoder: The best visual embeddings are not at the output of the network	Apr 17, 2025	Depth EstimationLanguage Modeling	CodeCode Available	8
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding	Apr 17, 2025	Video Question AnsweringVideo Understanding	CodeCode Available	7
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models	Apr 17, 2025	HallucinationVideo Understanding	CodeCode Available	1
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization	Apr 16, 2025	HallucinationQuestion Answering	—Unverified	0
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild	Apr 15, 2025	SegmentationSemantic Segmentation	—Unverified	0
OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding	Apr 15, 2025	Semantic SegmentationVideo Generation	—Unverified	0
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model	Apr 14, 2025	Computational EfficiencyLanguage Modeling	—Unverified	0

Show:10 25 50

← PrevPage 12 of 115Next →

No leaderboard results yet.