SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 141150 of 1149 papers

TitleStatusHype
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1Code2
H2VU-Benchmark: A Comprehensive Benchmark for Hierarchical Holistic Video Understanding0
DANTE-AD: Dual-Vision Attention Network for Long-Term Audio Description0
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition0
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts0
BOLT: Boost Large Vision-Language Model Without Training for Long-form Video UnderstandingCode1
Mobile-VideoGPT: Fast and Accurate Video Understanding Language ModelCode2
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment0
Self-ReS: Self-Reflection in Large Vision-Language Models for Long Video Understanding0
Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video RepresentationsCode0
Show:102550
← PrevPage 15 of 115Next →

No leaderboard results yet.