SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 476500 of 1149 papers

TitleStatusHype
InstructionBench: An Instructional Video Understanding Benchmark0
Distantly Supervised Semantic Text Detection and Recognition for Broadcast Sports Videos Understanding0
AVT: Audio-Video Transformer for Multimodal Action Recognition0
Aligned Better, Listen Better for Audio-Visual Large Language Models0
Disentangle and denoise: Tackling context misalignment for video moment retrieval0
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding0
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs0
Large-Scale Video Classification with Feature Space Augmentation coupled with Learned Label Relations and Ensembling0
Large Scale Video Representation Learning via Relational Graph Clustering0
Large-Scale YouTube-8M Video Understanding with Deep Neural Networks0
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision0
Dynamic Appearance: A Video Representation for Action Recognition with Joint Training0
Beyond the Camera: Neural Networks in World Coordinates0
Dynamic Graph Modules for Modeling Object-Object Interactions in Activity Recognition0
Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection0
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval0
Learning Dynamic MRI Reconstruction with Convolutional Network Assisted Reconstruction Swin Transformer0
Learning Dynamics via Graph Neural Networks for Human Pose Estimation and Tracking0
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment0
Learning from Multiple Sources for Video Summarisation0
Learning Higher-order Object Interactions for Keypoint-based Video Understanding0
Inductive Attention for Video Action Anticipation0
Discrete neural representations for explainable anomaly detection0
Improving Video Model Transfer With Dynamic Representation Learning0
Improving LLM Video Understanding with 16 Frames Per Second0
Show:102550
← PrevPage 20 of 46Next →

No leaderboard results yet.