SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 301325 of 1149 papers

TitleStatusHype
SPAct: Self-supervised Privacy Preservation for Action RecognitionCode1
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D ChallengesCode1
EPIC Fields: Marrying 3D Geometry and Video UnderstandingCode1
-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory ConsolidationCode1
CEFHRI: A Communication Efficient Federated Learning Framework for Recognizing Industrial Human-Robot InteractionCode1
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and MitigationCode1
A Simple LLM Framework for Long-Range Video Question-AnsweringCode1
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object SegmentationCode1
SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer VideosCode1
Spotting Temporally Precise, Fine-Grained Events in VideoCode1
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsCode1
Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event AnalysisCode1
Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video UnderstandingCode1
Learning Self-Similarity in Space and Time as Generalized Motion for Video Action RecognitionCode1
Enhancing Self-supervised Video Representation Learning via Multi-level Feature OptimizationCode1
Facial Dynamics in Video: Instruction Tuning for Improved Facial Expression Perception and Contextual AwarenessCode1
Fact-R1: Towards Explainable Video Misinformation Detection with Deep ReasoningCode1
Learning Optical Flow with Adaptive Graph ReasoningCode1
Learning Temporally Latent Causal Processes from General Temporal DataCode1
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports VideosCode1
Leveraging triplet loss for unsupervised action segmentationCode1
Feature Combination Meets Attention: Baidu Soccer Embeddings and Transformer based Temporal DetectionCode1
Large Scale Holistic Video UnderstandingCode1
Slow-Fast Architecture for Video Multi-Modal Large Language ModelsCode1
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task PerspectivesCode1
Show:102550
← PrevPage 13 of 46Next →

No leaderboard results yet.