SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 826850 of 1149 papers

TitleStatusHype
VideoGLUE: Video General Understanding Evaluation of Foundation Models0
ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval ModelsCode0
Temporal Action Proposal Generation With Action Frequency Adaptive NetworkCode0
Learning Space-Time Semantic Correspondences0
Learning Fine-grained View-Invariant Representations from Unpaired Ego-Exo Videos via Temporal Alignment0
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning0
Teacher Agent: A Knowledge Distillation-Free Framework for Rehearsal-based Video Incremental LearningCode0
Action Sensitivity Learning for Temporal Action Localization0
Learning Higher-order Object Interactions for Keypoint-based Video Understanding0
A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero ShotCode0
Vehicle Detection and Classification without Residual Calculation: Accelerating HEVC Image Decoding with Random Perturbation Injection0
ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System0
MRSN: Multi-Relation Support Network for Video Action Detection0
Search-Map-Search: A Frame Selection Paradigm for Action Recognition0
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision0
Therbligs in Action: Video Understanding through Motion Primitives0
DOAD: Decoupled One Stage Action Detection Network0
SVT: Supertoken Video Transformer for Efficient Video Understanding0
System-status-aware Adaptive Network for Online Streaming Video Understanding0
Selective Structured State-Spaces for Long-Form Video Understanding0
Leaping Into Memories: Space-Time Deep Feature SynthesisCode0
Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-based Video Classification Frameworks0
MINOTAUR: Multi-task Video Grounding From Multimodal QueriesCode0
Semi-Parametric Video-Grounded Text Generation0
Building Scalable Video Understanding Benchmarks through Sports0
Show:102550
← PrevPage 34 of 46Next →

No leaderboard results yet.