SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 276300 of 1149 papers

TitleStatusHype
-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory ConsolidationCode1
BehAVE: Behaviour Alignment of Video Game EncodingsCode1
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video UnderstandingCode1
A Simple LLM Framework for Long-Range Video Question-AnsweringCode1
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
CEFHRI: A Communication Efficient Federated Learning Framework for Recognizing Industrial Human-Robot InteractionCode1
A Dataset for Medical Instructional Video Classification and Question AnsweringCode1
IntentVizor: Towards Generic Query Guided Interactive Video SummarizationCode1
Slot State Space ModelsCode1
SFMViT: SlowFast Meet ViT in Chaotic WorldCode1
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary InvestigationCode1
Self-supervised Learning of Echocardiographic Video Representations via Online Cluster DistillationCode1
CATER: A diagnostic dataset for Compositional Actions and TEmporal ReasoningCode1
CAST: Cross-Attention in Space and Time for Video Action RecognitionCode1
Towards Visually Explaining Video Understanding Networks with PerturbationCode1
Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot VideosCode1
ETAD: Training Action Detection End to End on a LaptopCode1
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task PerspectivesCode1
REVECA -- Rich Encoder-decoder framework for Video Event CAptionerCode1
EPIC Fields: Marrying 3D Geometry and Video UnderstandingCode1
Revisiting spatio-temporal layouts for compositional action recognitionCode1
Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event AnalysisCode1
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?Code1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
Enhancing Self-supervised Video Representation Learning via Multi-level Feature OptimizationCode1
Show:102550
← PrevPage 12 of 46Next →

No leaderboard results yet.