SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 441450 of 1149 papers

TitleStatusHype
Harnessing Temporal Causality for Advanced Temporal Action DetectionCode3
EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video RetrievalCode1
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language UnderstandingCode2
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language ModelsCode3
Audio-visual training for improved grounding in video-text LLMs0
Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data0
Goldfish: Vision-Language Understanding of Arbitrarily Long VideosCode4
Open Vocabulary Multi-Label Video Classification0
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video UnderstandingCode1
VideoMamba: Spatio-Temporal Selective State Space ModelCode1
Show:102550
← PrevPage 45 of 115Next →

No leaderboard results yet.