SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 111120 of 1149 papers

TitleStatusHype
Are Vision LLMs Road-Ready? A Comprehensive Benchmark for Safety-Critical Driving Video UnderstandingCode0
How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?0
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval0
Perception Encoder: The best visual embeddings are not at the output of the networkCode8
PerceptionLM: Open-Access Data and Models for Detailed Visual UnderstandingCode7
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video ModelsCode1
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization0
PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild0
OmniVDiff: Omni Controllable Video Diffusion for Generation and Understanding0
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model0
Show:102550
← PrevPage 12 of 115Next →

No leaderboard results yet.