SOTAVerified

Video Understanding

A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.

Source: Action Detection from a Robot-Car Perspective

Papers

Showing 201225 of 1149 papers

TitleStatusHype
Crossover Learning for Fast Online Video Instance SegmentationCode1
IntentVizor: Towards Generic Query Guided Interactive Video SummarizationCode1
Helping Hands: An Object-Aware Ego-Centric Video Recognition ModelCode1
HAT: History-Augmented Anchor Transformer for Online Temporal Action LocalizationCode1
NExT-QA: Next Phase of Question-Answering to Explaining Temporal ActionsCode1
No Time to Waste: Squeeze Time into Channel for Mobile Video UnderstandingCode1
Object-Region Video TransformersCode1
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video RepresentationCode1
Contrastive Masked Autoencoders for Self-Supervised Video HashingCode1
Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task PerspectivesCode1
Occluded Video Instance Segmentation: A BenchmarkCode1
DEVIAS: Learning Disentangled Video Representations of Action and SceneCode1
Panoptic Video Scene Graph GenerationCode1
Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric VideosCode1
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video UnderstandingCode1
Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action SegmentationCode1
AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual SegmentationCode1
Disentangle Your Dense Object DetectorCode1
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-AnsweringCode1
DisTime: Distribution-based Time Representation for Video Large Language ModelsCode1
BasicTAD: an Astounding RGB-Only Baseline for Temporal Action DetectionCode1
Learning Optical Flow with Adaptive Graph ReasoningCode1
From My View to Yours: Ego-Augmented Learning in Large Vision Language Models for Understanding Exocentric Daily Living ActivitiesCode1
Free Lunch for Surgical Video Understanding by Distilling Self-SupervisionsCode1
Grounded Question-Answering in Long Egocentric VideosCode1
Show:102550
← PrevPage 9 of 46Next →

No leaderboard results yet.