SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 251300 of 1149 papers

TitleStatusHype
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video UnderstandingCode1
EEV: A Large-Scale Dataset for Studying Evoked Expressions from VideoCode1
Learning Optical Flow with Adaptive Graph ReasoningCode1
Learning Salient Boundary Feature for Anchor-free Temporal Action LocalizationCode1
Learning Self-Similarity in Space and Time as a Generalized Motion for Action RecognitionCode1
Compositional Video Understanding with Spatiotemporal Structure-based TransformersCode1
SPAct: Self-supervised Privacy Preservation for Action RecognitionCode1
Language Repository for Long Video UnderstandingCode1
SoccerNet 2023 Challenges ResultsCode1
Language-Guided Audio-Visual Learning for Long-Term Sports AssessmentCode1
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action RecognitionCode1
SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer VideosCode1
Learning Temporally Causal Latent Processes from General Temporal DataCode1
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language ModelsCode1
Clover: Towards A Unified Video-Language Alignment and Fusion ModelCode1
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip RetrievalCode1
Isolated Sign Recognition from RGB Video using Pose Flow and Self-AttentionCode1
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMsCode1
COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language BenchmarkCode1
Is Appearance Free Action Recognition Possible?Code1
Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action SegmentationCode1
SoccerNet 2022 Challenges ResultsCode1
Spatial-Temporal Transformer for Dynamic Scene Graph GenerationCode1
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction TuningCode1
SiLVR: A Simple Language-based Video Reasoning FrameworkCode1
-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory ConsolidationCode1
BehAVE: Behaviour Alignment of Video Game EncodingsCode1
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video UnderstandingCode1
A Simple LLM Framework for Long-Range Video Question-AnsweringCode1
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video UnderstandingCode1
CEFHRI: A Communication Efficient Federated Learning Framework for Recognizing Industrial Human-Robot InteractionCode1
A Dataset for Medical Instructional Video Classification and Question AnsweringCode1
IntentVizor: Towards Generic Query Guided Interactive Video SummarizationCode1
Slot State Space ModelsCode1
SFMViT: SlowFast Meet ViT in Chaotic WorldCode1
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary InvestigationCode1
Self-supervised Learning of Echocardiographic Video Representations via Online Cluster DistillationCode1
CATER: A diagnostic dataset for Compositional Actions and TEmporal ReasoningCode1
CAST: Cross-Attention in Space and Time for Video Action RecognitionCode1
Towards Visually Explaining Video Understanding Networks with PerturbationCode1
Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot VideosCode1
ETAD: Training Action Detection End to End on a LaptopCode1
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task PerspectivesCode1
REVECA -- Rich Encoder-decoder framework for Video Event CAptionerCode1
EPIC Fields: Marrying 3D Geometry and Video UnderstandingCode1
Revisiting spatio-temporal layouts for compositional action recognitionCode1
Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event AnalysisCode1
How Severe is Benchmark-Sensitivity in Video Self-Supervised Learning?Code1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
Enhancing Self-supervised Video Representation Learning via Multi-level Feature OptimizationCode1
Show:102550
← PrevPage 6 of 23Next →

No leaderboard results yet.